Related papers: An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

URL: http://arxiv.org/abs/2206.09556v1
Date: Mon, 20 Jun 2022 03:46:47 GMT
Title: An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models
Authors: Rahil Parikh, Gaspar Rochette, Carol Espy-Wilson, Shihab Shamma
Abstract summary: We investigate ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture. We find that end-to-end networks are highly unstable, and perform poorly when confronted with deformations which are imperceptible to humans.
Score: 0.8666275811953879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: End-to-end learning models have demonstrated a remarkable capability in performing speech segregation. Despite their wide-scope of real-world applications, little is known about the mechanisms they employ to group and consequently segregate individual speakers. Knowing that harmonicity is a critical cue for these networks to group sources, in this work, we perform a thorough investigation on ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture. We perform ablation studies where we apply low-pass, high-pass, and band-stop filters of varying pass-bands to empirically analyze the harmonics most critical for segregation. We also investigate how these networks decide which output channel to assign to an estimated source by introducing discontinuities in synthetic mixtures. We find that end-to-end networks are highly unstable, and perform poorly when confronted with deformations which are imperceptible to humans. Replacing the encoder in these networks with a spectrogram leads to lower overall performance, but much higher stability. This work helps us to understand what information these network rely on for speech segregation, and exposes two sources of generalization-errors. It also pinpoints the encoder as the part of the network responsible for these errors, allowing for a redesign with expert knowledge or transfer learning.

Related papers

Understanding Generalization, Robustness, and Interpretability in Low-Capacity Neural Networks [0.0]
We introduce a framework to investigate capacity, sparsity, and robustness in low-capacity networks.<n>We show that trained networks are robust to extreme magnitude pruning (up to 95% sparsity)<n>This work provides a clear, empirical demonstration of the trade-offs governing simple neural networks.
arXiv Detail & Related papers (2025-07-22T06:43:03Z)
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias [76.85949078144098]
This paper focuses on textual hallucinations, where diffusion models correctly generate individual symbols but assemble them in a nonsensical manner. We observe that such phenomenon is attributed it to the network's local generation bias. We also theoretically analyze the training dynamics for a specific case involving a two-layer learning parity points on a hypercube.
arXiv Detail & Related papers (2025-03-05T15:28:50Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract. We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z)
Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z)
Sample Complexity of Opinion Formation on Networks with Linear Regression Models [36.75032460874647]
We study the study of sample complexity of opinion convergence in networks. Our framework is built on the recognized opinion formation game. Empirical results on both synthetic and real-world networks strongly support our theoretical findings.
arXiv Detail & Related papers (2023-11-04T08:28:33Z)
Using Early Readouts to Mediate Featural Bias in Distillation [30.5299408494168]
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers.
arXiv Detail & Related papers (2023-10-28T04:58:15Z)
Investigating Adversarial Vulnerability and Implicit Bias through Frequency Analysis [0.3985805843651649]
In this work, we investigate the relation between these perturbations and the implicit bias of neural networks trained with gradient-based algorithms. We identify the minimal and most critical frequencies necessary for accurate classification or misclassification respectively for each input image and its adversarially perturbed version. Our results provide empirical evidence that the network bias in Fourier space and the target frequencies of adversarial attacks are highly correlated and suggest new potential strategies for adversarial defence.
arXiv Detail & Related papers (2023-05-24T14:40:23Z)
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics [2.9443230571766854]
We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence. We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
arXiv Detail & Related papers (2023-01-14T04:21:25Z)
Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. We propose a self-supervised debiasing framework potentially compatible with unlabeled samples. Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z)
Dissecting U-net for Seismic Application: An In-Depth Study on Deep Learning Multiple Removal [3.058685580689605]
Seismic processing often requires suppressing multiples that appear when collecting data. We present a deep learning-based alternative that provides competitive results, while reducing its usage's complexity.
arXiv Detail & Related papers (2022-06-24T07:16:27Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs) In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.