Constrained Multiview Representation for Self-supervised Contrastive
Learning
- URL: http://arxiv.org/abs/2402.03456v1
- Date: Mon, 5 Feb 2024 19:09:33 GMT
- Title: Constrained Multiview Representation for Self-supervised Contrastive
Learning
- Authors: Siyuan Dai, Kai Ye, Kun Zhao, Ge Cui, Haoteng Tang, Liang Zhan
- Abstract summary: We introduce a novel approach predicated on representation distance-based mutual information (MI) for measuring the significance of different views.
We harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information.
- Score: 4.817827522417457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representation learning constitutes a pivotal cornerstone in contemporary
deep learning paradigms, offering a conduit to elucidate distinctive features
within the latent space and interpret the deep models. Nevertheless, the
inherent complexity of anatomical patterns and the random nature of lesion
distribution in medical image segmentation pose significant challenges to the
disentanglement of representations and the understanding of salient features.
Methods guided by the maximization of mutual information, particularly within
the framework of contrastive learning, have demonstrated remarkable success and
superiority in decoupling densely intertwined representations. However, the
effectiveness of contrastive learning highly depends on the quality of the
positive and negative sample pairs, i.e. the unselected average mutual
information among multi-views would obstruct the learning strategy so the
selection of the views is vital. In this work, we introduce a novel approach
predicated on representation distance-based mutual information (MI)
maximization for measuring the significance of different views, aiming at
conducting more efficient contrastive learning and representation
disentanglement. Additionally, we introduce an MI re-ranking strategy for
representation selection, benefiting both the continuous MI estimating and
representation significance distance measuring. Specifically, we harness
multi-view representations extracted from the frequency domain, re-evaluating
their significance based on mutual information across varying frequencies,
thereby facilitating a multifaceted contrastive learning approach to bolster
semantic comprehension. The statistical results under the five metrics
demonstrate that our proposed framework proficiently constrains the MI
maximization-driven representation selection and steers the multi-view
contrastive learning process.
Related papers
- Independence Constrained Disentangled Representation Learning from Epistemological Perspective [13.51102815877287]
Disentangled Representation Learning aims to improve the explainability of deep learning methods by training a data encoder that identifies semantically meaningful latent variables in the data generation process.
There is no consensus regarding the objective of disentangled representation learning.
We propose a novel method for disentangled representation learning by employing an integration of mutual information constraint and independence constraint.
arXiv Detail & Related papers (2024-09-04T13:00:59Z) - Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning [10.630297877530614]
We propose a novel Multi-Grained Contrast method (MGC) for unsupervised representation learning.
Specifically, we construct delicate multi-grained correspondences between positive views and then conduct multi-grained contrast by the correspondences to learn more general unsupervised representations.
Our method significantly outperforms the existing state-of-the-art methods on extensive downstream tasks, including object detection, instance segmentation, scene parsing, semantic segmentation and keypoint detection.
arXiv Detail & Related papers (2024-07-02T07:35:21Z) - Rethinking Multi-view Representation Learning via Distilled Disentangling [34.14711778177439]
Multi-view representation learning aims to derive robust representations that are both view-consistent and view-specific from diverse data sources.
This paper presents an in-depth analysis of existing approaches in this domain, highlighting the redundancy between view-consistent and view-specific representations.
We propose an innovative framework for multi-view representation learning, which incorporates a technique we term 'distilled disentangling'
arXiv Detail & Related papers (2024-03-16T11:21:24Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Disentangling Multi-view Representations Beyond Inductive Bias [32.15900989696017]
We propose a novel multi-view representation disentangling method that ensures both interpretability and generalizability of the resulting representations.
Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance.
arXiv Detail & Related papers (2023-08-03T09:09:28Z) - Improving the Modality Representation with Multi-View Contrastive
Learning for Multimodal Sentiment Analysis [15.623293264871181]
This study investigates the improvement approaches of modality representation with contrastive learning.
We devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives.
We conduct experiments on three open datasets, and results show the advance of our model.
arXiv Detail & Related papers (2022-10-28T01:25:16Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - A Variational Information Bottleneck Approach to Multi-Omics Data
Integration [98.6475134630792]
We propose a deep variational information bottleneck (IB) approach for incomplete multi-view observations.
Our method applies the IB framework on marginal and joint representations of the observed views to focus on intra-view and inter-view interactions that are relevant for the target.
Experiments on real-world datasets show that our method consistently achieves gain from data integration and outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-02-05T06:05:39Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Deep Partial Multi-View Learning [94.39367390062831]
We propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets)
We fifirst provide a formal defifinition of completeness and versatility for multi-view representation.
We then theoretically prove the versatility of the learned latent representations.
arXiv Detail & Related papers (2020-11-12T02:29:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.