CLAR: Contrastive Learning of Auditory Representations
- URL: http://arxiv.org/abs/2010.09542v1
- Date: Mon, 19 Oct 2020 14:15:31 GMT
- Title: CLAR: Contrastive Learning of Auditory Representations
- Authors: Haider Al-Tahan and Yalda Mohsenzadeh
- Abstract summary: We introduce various data augmentations suitable for auditory data and evaluate their impact on predictive performance.
We show that training with time-frequency audio features substantially improves the quality of the learned representations.
We illustrate that by combining all these methods and with substantially less labeled data, our framework (CLAR) achieves significant improvement on prediction performance.
- Score: 6.1424670675582576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rich visual representations using contrastive self-supervised
learning has been extremely successful. However, it is still a major question
whether we could use a similar approach to learn superior auditory
representations. In this paper, we expand on prior work (SimCLR) to learn
better auditory representations. We (1) introduce various data augmentations
suitable for auditory data and evaluate their impact on predictive performance,
(2) show that training with time-frequency audio features substantially
improves the quality of the learned representations compared to raw signals,
and (3) demonstrate that training with both supervised and contrastive losses
simultaneously improves the learned representations compared to self-supervised
pre-training followed by supervised fine-tuning. We illustrate that by
combining all these methods and with substantially less labeled data, our
framework (CLAR) achieves significant improvement on prediction performance
compared to supervised approach. Moreover, compared to self-supervised
approach, our framework converges faster with significantly better
representations.
Related papers
- Improving the Modality Representation with Multi-View Contrastive
Learning for Multimodal Sentiment Analysis [15.623293264871181]
This study investigates the improvement approaches of modality representation with contrastive learning.
We devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives.
We conduct experiments on three open datasets, and results show the advance of our model.
arXiv Detail & Related papers (2022-10-28T01:25:16Z) - R\'enyiCL: Contrastive Representation Learning with Skew R\'enyi
Divergence [78.15455360335925]
We present a new robust contrastive learning scheme, coined R'enyiCL, which can effectively manage harder augmentations.
Our method is built upon the variational lower bound of R'enyi divergence.
We show that R'enyi contrastive learning objectives perform innate hard negative sampling and easy positive sampling simultaneously.
arXiv Detail & Related papers (2022-08-12T13:37:05Z) - SLIP: Self-supervision meets Language-Image Pre-training [79.53764315471543]
We study whether self-supervised learning can aid in the use of language supervision for visual representation learning.
We introduce SLIP, a multi-task learning framework for combining self-supervised learning and CLIP pre-training.
We find that SLIP enjoys the best of both worlds: better performance than self-supervision and language supervision.
arXiv Detail & Related papers (2021-12-23T18:07:13Z) - The Impact of Spatiotemporal Augmentations on Self-Supervised
Audiovisual Representation Learning [2.28438857884398]
We present a contrastive framework to learn audiovisual representations from unlabeled videos.
We find lossy-temporal transformations that do not corrupt the temporal coherency of videos are the most effective.
Compared to self-supervised models pre-trained on only sampling-based temporal augmentation, self-supervised models pre-trained with our temporal augmentations lead to approximately 6.5% gain on linear performance on dataset AVE.
arXiv Detail & Related papers (2021-10-13T23:48:58Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - Self-supervised Text-independent Speaker Verification using Prototypical
Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning.
We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z) - A Simple Framework for Contrastive Learning of Visual Representations [116.37752766922407]
This paper presents SimCLR: a simple framework for contrastive learning of visual representations.
We show that composition of data augmentations plays a critical role in defining effective predictive tasks.
We are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
arXiv Detail & Related papers (2020-02-13T18:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.