Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects
- URL: http://arxiv.org/abs/2501.15900v1
- Date: Mon, 27 Jan 2025 09:49:08 GMT
- Title: Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects
- Authors: Victor Deng, Changhong Wang, Gael Richard, Brian McFee,
- Abstract summary: We investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models.
By applying parameterized audio effects, we analyze the correlation between the deformation trajectories and the effect strength in the embedding space.
We find that there exists a direction along which the embeddings move monotonically as the audio effect strength increases, but that the subspace containing the displacements is generally high-dimensional.
- Score: 4.202522944371801
- License:
- Abstract: In recent years, foundation models have significantly advanced data-driven systems across various domains. Yet, their underlying properties, especially when functioning as feature extractors, remain under-explored. In this paper, we investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models, including OpenL3, PANNs, and CLAP. We focus on audio effects as the source of sensitivity due to their prevalent presence in large audio datasets. By applying parameterized audio effects (gain, low-pass filtering, reverberation, and bitcrushing), we analyze the correlation between the deformation trajectories and the effect strength in the embedding space. We propose to quantify the dimensionality and linearizability of the deformation trajectories induced by audio effects using canonical correlation analysis. We find that there exists a direction along which the embeddings move monotonically as the audio effect strength increases, but that the subspace containing the displacements is generally high-dimensional. This shows that pre-trained audio embeddings do not globally linearize the effects. Our empirical results on instrument classification downstream tasks confirm that projecting out the estimated deformation directions cannot generally improve the robustness of pre-trained embeddings to audio effects.
Related papers
- Unveiling and Mitigating Bias in Audio Visual Segmentation [9.427676046134374]
Community researchers have developed a range of advanced audio-visual segmentation models to improve the quality of sounding objects' masks.
While masks created by these models may initially appear plausible, they occasionally exhibit anomalies with incorrect grounding logic.
We attribute this to real-world inherent preferences and distributions as a simpler signal for learning than the complex audio-visual grounding.
arXiv Detail & Related papers (2024-07-23T16:55:04Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - A Study on Robustness to Perturbations for Representations of
Environmental Sound [16.361059909912758]
We evaluate two embeddings -- YAMNet, and OpenL$3$ on monophonic (UrbanSound8K) and polyphonic (SONYC UST) datasets.
We imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new embeddings with three distance measures.
arXiv Detail & Related papers (2022-03-20T01:04:38Z) - On Dynamic Noise Influence in Differentially Private Learning [102.6791870228147]
Private Gradient Descent (PGD) is a commonly used private learning framework, which noises based on the Differential protocol.
Recent studies show that emphdynamic privacy schedules can improve at the final iteration, yet yet theoreticals of the effectiveness of such schedules remain limited.
This paper provides comprehensive analysis of noise influence in dynamic privacy schedules to answer these critical questions.
arXiv Detail & Related papers (2021-01-19T02:04:00Z) - Influence Functions in Deep Learning Are Fragile [52.31375893260445]
influence functions approximate the effect of samples in test-time predictions.
influence estimates are fairly accurate for shallow networks.
Hessian regularization is important to get highquality influence estimates.
arXiv Detail & Related papers (2020-06-25T18:25:59Z) - Exploring Quality and Generalizability in Parameterized Neural Audio
Effects [0.0]
Deep neural networks have shown promise for music audio signal processing applications.
Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls.
This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
arXiv Detail & Related papers (2020-06-10T00:52:08Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.