A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References
- URL: http://arxiv.org/abs/2508.14623v1
- Date: Wed, 20 Aug 2025 11:22:11 GMT
- Title: A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References
- Authors: Simon Dahl Jepsen, Mads Græsbøll Christensen, Jesper Rindom Jensen,
- Abstract summary: This paper examines the implications of using the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as both evaluation and training objective in supervised speech separation.<n>A derivation of the SI-SDR with noisy references reveals that noise limits the achievable SI-SDR, or leads to undesired noise in the separated outputs.
- Score: 16.172800007896285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper examines the implications of using the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as both evaluation and training objective in supervised speech separation, when the training references contain noise, as is the case with the de facto benchmark WSJ0-2Mix. A derivation of the SI-SDR with noisy references reveals that noise limits the achievable SI-SDR, or leads to undesired noise in the separated outputs. To address this, a method is proposed to enhance references and augment the mixtures with WHAM!, aiming to train models that avoid learning noisy references. Two models trained on these enhanced datasets are evaluated with the non-intrusive NISQA.v2 metric. Results show reduced noise in separated speech but suggest that processing references may introduce artefacts, limiting overall quality gains. Negative correlation is found between SI-SDR and perceived noisiness across models on the WSJ0-2Mix and Libri2Mix test sets, underlining the conclusion from the derivation.
Related papers
- Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation [4.631922211808715]
Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems.<n>We propose SAID (Semantics-Aware Implicit Denoising), a framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions.<n>Experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines.
arXiv Detail & Related papers (2026-02-17T04:58:21Z) - Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models [53.79667447811139]
We show that a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal.<n>These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader estimation.
arXiv Detail & Related papers (2026-02-04T15:25:02Z) - CARD: Correlation Aware Restoration with Diffusion [8.859116375276157]
Correlation Aware Restoration with Diffusion (CARD) is a training-free extension of DDRM that handles correlated Gaussian noise.<n>To emphasize the importance of addressing correlated noise, we contribute CIN-D, a novel correlated noise dataset captured across diverse illumination conditions.<n>CARD consistently outperforms existing methods across denoising, deblurring, and super-resolution tasks.
arXiv Detail & Related papers (2025-12-04T21:46:43Z) - Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z) - NGGAN: Noise Generation GAN Based on the Practical Measurement Dataset for Narrowband Powerline Communications [11.68930533749534]
We propose a novel generative adversarial network (GAN) called noise generation GAN (NGGAN) that learns the complicated characteristics of practically measured noise samples for data synthesis.<n> Simulation results demonstrate that the generated noise samples from the proposed NGGAN are highly close to the real noise samples.
arXiv Detail & Related papers (2025-10-02T09:47:56Z) - Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations.<n>Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness.<n>NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z) - Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning [12.660401635672969]
We propose textittextbfDeep Regression against textbfNoise via textbfContrastive textbfL earning (DN-CL).
DN-CL employs two parameter-sharing encoders to embed data points from various data transformations into feature shields against noise.
Our experiments indicate that DN-CL demonstrates superior performance in handling both noisy and clean data.
arXiv Detail & Related papers (2024-06-21T03:13:40Z) - Relation Modeling and Distillation for Learning with Noisy Labels [4.556974104115929]
This paper proposes a relation modeling and distillation framework that models inter-sample relationships via self-supervised learning.
The proposed framework can learn discriminative representations for noisy data, which results in superior performance than the existing methods.
arXiv Detail & Related papers (2024-05-30T01:47:27Z) - PLReMix: Combating Noisy Labels with Pseudo-Label Relaxed Contrastive Representation Learning [7.556169113399857]
We propose an end-to-end textbfPLReMix framework by introducing a Pseudo-Label Relaxed (PLR) contrastive loss.<n>The proposed PLR loss is pluggable and we have integrated it into other LNL methods, observing their improved performance.
arXiv Detail & Related papers (2024-02-27T15:22:20Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy.
Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification.
We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z) - Improving Stability of LS-GANs for Audio and Speech Signals [70.15099665710336]
We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms.
We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs.
arXiv Detail & Related papers (2020-08-12T17:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.