Augmented Contrastive Self-Supervised Learning for Audio Invariant
Representations
- URL: http://arxiv.org/abs/2112.10950v1
- Date: Tue, 21 Dec 2021 02:50:53 GMT
- Title: Augmented Contrastive Self-Supervised Learning for Audio Invariant
Representations
- Authors: Melikasadat Emami, Dung Tran, Kazuhito Koishida
- Abstract summary: We propose an augmented contrastive SSL framework to learn invariant representations from unlabeled data.
Our method applies various perturbations to the unlabeled input data and utilizes contrastive learning to learn representations robust to such perturbations.
- Score: 28.511060004984895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving generalization is a major challenge in audio classification due to
labeled data scarcity. Self-supervised learning (SSL) methods tackle this by
leveraging unlabeled data to learn useful features for downstream
classification tasks. In this work, we propose an augmented contrastive SSL
framework to learn invariant representations from unlabeled data. Our method
applies various perturbations to the unlabeled input data and utilizes
contrastive learning to learn representations robust to such perturbations.
Experimental results on the Audioset and DESED datasets show that our framework
significantly outperforms state-of-the-art SSL and supervised learning methods
on sound/event classification tasks.
Related papers
- DIDA: Denoised Imitation Learning based on Domain Adaptation [28.36684781402964]
We focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise.
We propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data.
Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods.
arXiv Detail & Related papers (2024-04-04T11:29:05Z) - Self-Supervised Learning for Anomalous Sound Detection [0.43512163406551996]
State-of-the-art anomalous sound detection (ASD) systems are often trained by using an auxiliary classification task to learn an embedding space.
A new state-of-the-art performance for the DCASE2023 ASD dataset is obtained that outperforms all other published results on this dataset by a large margin.
arXiv Detail & Related papers (2023-12-15T07:16:12Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Channel-Wise Contrastive Learning for Learning with Noisy Labels [60.46434734808148]
We introduce channel-wise contrastive learning (CWCL) to distinguish authentic label information from noise.
Unlike conventional instance-wise contrastive learning (IWCL), CWCL tends to yield more nuanced and resilient features aligned with the authentic labels.
Our strategy is twofold: firstly, using CWCL to extract pertinent features to identify cleanly labeled samples, and secondly, progressively fine-tuning using these samples.
arXiv Detail & Related papers (2023-08-14T06:04:50Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - More Speaking or More Speakers? [17.143456510764576]
Self-training (ST) and self-supervised learning (SSL) methods have demonstrated strong improvements in automatic speech recognition (ASR)
In this work we aim to analyse the effect of numbers of speakers in the training data on a recent SSL algorithm (wav2vec 2.0) and a recent ST algorithm (slimIPL)
Our findings suggest that SSL requires a large amount of unlabeled data to produce high accuracy results, while ST requires a sufficient number of speakers in the labelled data, especially in the low-regime setting.
arXiv Detail & Related papers (2022-11-02T03:50:40Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - Collaborative Intelligence Orchestration: Inconsistency-Based Fusion of
Semi-Supervised Learning and Active Learning [60.26659373318915]
Active learning (AL) and semi-supervised learning (SSL) are two effective, but often isolated, means to alleviate the data-hungry problem.
We propose an innovative Inconsistency-based virtual aDvErial algorithm to further investigate SSL-AL's potential superiority.
Two real-world case studies visualize the practical industrial value of applying and deploying the proposed data sampling algorithm.
arXiv Detail & Related papers (2022-06-07T13:28:43Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Unsupervised Contrastive Learning of Sound Event Representations [30.914808451327403]
Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data.
In this work, we explore unsupervised contrastive learning as a way to learn sound event representations.
Our results suggest that unsupervised contrastive pre-training can mitigate the impact of data scarcity and increase robustness against noisy labels.
arXiv Detail & Related papers (2020-11-15T19:50:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.