Related papers: Contrastive Separative Coding for Self-supervised Representation Learning

Contrastive Separative Coding for Self-supervised Representation Learning

URL: http://arxiv.org/abs/2103.00816v1
Date: Mon, 1 Mar 2021 07:32:00 GMT
Title: Contrastive Separative Coding for Self-supervised Representation Learning
Authors: Jun Wang, Max W. Y. Lam, Dan Su, Dong Yu
Abstract summary: We propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC) First, a multi-task separative encoder is built to extract shared separable and discriminative embedding. Second, we propose a powerful cross-attention mechanism performed over speaker representations across various interfering conditions.
Score: 37.697375719184926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To extract robust deep representations from long sequential modeling of speech data, we propose a self-supervised learning approach, namely Contrastive Separative Coding (CSC). Our key finding is to learn such representations by separating the target signal from contrastive interfering signals. First, a multi-task separative encoder is built to extract shared separable and discriminative embedding; secondly, we propose a powerful cross-attention mechanism performed over speaker representations across various interfering conditions, allowing the model to focus on and globally aggregate the most critical information to answer the "query" (current bottom-up embedding) while paying less attention to interfering, noisy, or irrelevant parts; lastly, we form a new probabilistic contrastive loss which estimates and maximizes the mutual information between the representations and the global speaker vector. While most prior unsupervised methods have focused on predicting the future, neighboring, or missing samples, we take a different perspective of predicting the interfered samples. Moreover, our contrastive separative loss is free from negative sampling. The experiment demonstrates that our approach can learn useful representations achieving a strong speaker verification performance in adverse conditions.

Related papers

$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR) MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules. To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z)
An Attention-based Framework for Fair Contrastive Learning [2.1605931466490795]
We propose a new method for fair contrastive learning that employs an attention mechanism to model bias-causing interactions. Our attention mechanism avoids bias-causing samples that confound the model and focuses on bias-reducing samples that help learn semantically meaningful representations.
arXiv Detail & Related papers (2024-11-22T07:11:35Z)
Regularized Contrastive Partial Multi-view Outlier Detection [76.77036536484114]
We propose a novel method named Regularized Contrastive Partial Multi-view Outlier Detection (RCPMOD) In this framework, we utilize contrastive learning to learn view-consistent information and distinguish outliers by the degree of consistency. Experimental results on four benchmark datasets demonstrate that our proposed approach could outperform state-of-the-art competitors.
arXiv Detail & Related papers (2024-08-02T14:34:27Z)
Constrained Multiview Representation for Self-supervised Contrastive Learning [4.817827522417457]
We introduce a novel approach predicated on representation distance-based mutual information (MI) for measuring the significance of different views. We harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information.
arXiv Detail & Related papers (2024-02-05T19:09:33Z)
Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data. Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
Self-supervised Text-independent Speaker Verification using Prototypical Momentum Contrastive Learning [58.14807331265752]
We show that better speaker embeddings can be learned by momentum contrastive learning. We generalize the self-supervised framework to a semi-supervised scenario where only a small portion of the data is labeled.
arXiv Detail & Related papers (2020-12-13T23:23:39Z)
Understanding Adversarial Examples from the Mutual Influence of Images and Perturbations [83.60161052867534]
We analyze adversarial examples by disentangling the clean images and adversarial perturbations, and analyze their influence on each other. Our results suggest a new perspective towards the relationship between images and universal perturbations. We are the first to achieve the challenging task of a targeted universal attack without utilizing original training data.
arXiv Detail & Related papers (2020-07-13T05:00:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.