An Empirical Analysis on the Vulnerabilities of End-to-End Speech
Segregation Models
- URL: http://arxiv.org/abs/2206.09556v1
- Date: Mon, 20 Jun 2022 03:46:47 GMT
- Title: An Empirical Analysis on the Vulnerabilities of End-to-End Speech
Segregation Models
- Authors: Rahil Parikh, Gaspar Rochette, Carol Espy-Wilson, Shihab Shamma
- Abstract summary: We investigate ConvTasnet and DPT-Net to analyze how they perform a harmonic analysis of the input mixture.
We find that end-to-end networks are highly unstable, and perform poorly when confronted with deformations which are imperceptible to humans.
- Score: 0.8666275811953879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end learning models have demonstrated a remarkable capability in
performing speech segregation. Despite their wide-scope of real-world
applications, little is known about the mechanisms they employ to group and
consequently segregate individual speakers. Knowing that harmonicity is a
critical cue for these networks to group sources, in this work, we perform a
thorough investigation on ConvTasnet and DPT-Net to analyze how they perform a
harmonic analysis of the input mixture. We perform ablation studies where we
apply low-pass, high-pass, and band-stop filters of varying pass-bands to
empirically analyze the harmonics most critical for segregation. We also
investigate how these networks decide which output channel to assign to an
estimated source by introducing discontinuities in synthetic mixtures. We find
that end-to-end networks are highly unstable, and perform poorly when
confronted with deformations which are imperceptible to humans. Replacing the
encoder in these networks with a spectrogram leads to lower overall
performance, but much higher stability. This work helps us to understand what
information these network rely on for speech segregation, and exposes two
sources of generalization-errors. It also pinpoints the encoder as the part of
the network responsible for these errors, allowing for a redesign with expert
knowledge or transfer learning.
Related papers
- SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Gaussian Mixture Models for Affordance Learning using Bayesian Networks [50.18477618198277]
Affordances are fundamental descriptors of relationships between actions, objects and effects.
This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences.
arXiv Detail & Related papers (2024-02-08T22:05:45Z) - Using Early Readouts to Mediate Featural Bias in Distillation [30.5299408494168]
Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks.
We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers.
arXiv Detail & Related papers (2023-10-28T04:58:15Z) - Investigating Adversarial Vulnerability and Implicit Bias through Frequency Analysis [0.3985805843651649]
In this work, we investigate the relation between these perturbations and the implicit bias of neural networks trained with gradient-based algorithms.
We identify the minimal and most critical frequencies necessary for accurate classification or misclassification respectively for each input image and its adversarially perturbed version.
Our results provide empirical evidence that the network bias in Fourier space and the target frequencies of adversarial attacks are highly correlated and suggest new potential strategies for adversarial defence.
arXiv Detail & Related papers (2023-05-24T14:40:23Z) - Understanding the Spectral Bias of Coordinate Based MLPs Via Training
Dynamics [2.9443230571766854]
We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence.
We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
arXiv Detail & Related papers (2023-01-14T04:21:25Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - Dissecting U-net for Seismic Application: An In-Depth Study on Deep
Learning Multiple Removal [3.058685580689605]
Seismic processing often requires suppressing multiples that appear when collecting data.
We present a deep learning-based alternative that provides competitive results, while reducing its usage's complexity.
arXiv Detail & Related papers (2022-06-24T07:16:27Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Sparse Mixture of Local Experts for Efficient Speech Enhancement [19.645016575334786]
We investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.
By splitting up the speech denoising task into non-overlapping subproblems, we are able to improve denoising performance while also reducing computational complexity.
Our findings demonstrate that a fine-tuned ensemble network is able to exceed the speech denoising capabilities of a generalist network.
arXiv Detail & Related papers (2020-05-16T23:23:22Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.