A Principled Framework for Multi-View Contrastive Learning
- URL: http://arxiv.org/abs/2507.06979v1
- Date: Wed, 09 Jul 2025 16:07:17 GMT
- Title: A Principled Framework for Multi-View Contrastive Learning
- Authors: Panagiotis Koromilas, Efthymios Georgiou, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis A. Nicolaou, Yannis Panagakis,
- Abstract summary: Contrastive Learning (CL) is a leading paradigm in Self-Supervised Learning (SSL)<n>Current CL methods handle additional views suboptimally by simply aggregating different pairwise objectives.<n>We address these limitations through two novel loss functions: MV-InfoNCE and MV-DHEL.
- Score: 23.97266762318814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive Learning (CL), a leading paradigm in Self-Supervised Learning (SSL), typically relies on pairs of data views generated through augmentation. While multiple augmentations per instance (more than two) improve generalization in supervised learning, current CL methods handle additional views suboptimally by simply aggregating different pairwise objectives. This approach suffers from four critical limitations: (L1) it utilizes multiple optimization terms per data point resulting to conflicting objectives, (L2) it fails to model all interactions across views and data points, (L3) it inherits fundamental limitations (e.g. alignment-uniformity coupling) from pairwise CL losses, and (L4) it prevents fully realizing the benefits of increased view multiplicity observed in supervised settings. We address these limitations through two novel loss functions: MV-InfoNCE, which extends InfoNCE to incorporate all possible view interactions simultaneously in one term per data point, and MV-DHEL, which decouples alignment from uniformity across views while scaling interaction complexity with view multiplicity. Both approaches are theoretically grounded - we prove they asymptotically optimize for alignment of all views and uniformity, providing principled extensions to multi-view contrastive learning. Our empirical results on ImageNet1K and three other datasets demonstrate that our methods consistently outperform existing multi-view approaches and effectively scale with increasing view multiplicity. We also apply our objectives to multimodal data and show that, in contrast to other contrastive objectives, they can scale beyond just two modalities. Most significantly, ablation studies reveal that MV-DHEL with five or more views effectively mitigates dimensionality collapse by fully utilizing the embedding space, thereby delivering multi-view benefits observed in supervised learning.
Related papers
- MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z) - Balanced Multi-view Clustering [56.17836963920012]
Multi-view clustering (MvC) aims to integrate information from different views to enhance the capability of the model in capturing the underlying data structures.<n>The widely used joint training paradigm in MvC is potentially not fully leverage the multi-view information.<n>We propose a novel balanced multi-view clustering (BMvC) method, which introduces a view-specific contrastive regularization (VCR) to modulate the optimization of each view.
arXiv Detail & Related papers (2025-01-05T14:42:47Z) - DWCL: Dual-Weighted Contrastive Learning for Multi-View Clustering [9.945837095280256]
We introduce a novel model called Dual-Weighted Contrastive Learning (DWCL) for Multi-View Clustering.<n>Specifically, to reduce the impact of unreliable cross-views, we introduce an innovative Best-Other (B-O) contrastive mechanism.<n>We develop a dual weighting strategy that combines a view quality weight, reflecting the quality of each view, with a view discrepancy weight.
arXiv Detail & Related papers (2024-11-26T11:57:20Z) - Partial Multi-View Clustering via Meta-Learning and Contrastive Feature Alignment [13.511433241138702]
Partial multi-view clustering (PVC) presents significant challenges practical research problem for data analysis in real-world applications.
Existing clustering methods struggle to handle incomplete views effectively, leading to suboptimal clustering performance.
We propose a novel dual optimization framework based on contrastive learning, which aims to maximize the consistency of latent features in incomplete multi-view data.
arXiv Detail & Related papers (2024-11-14T19:16:01Z) - Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering [28.776476995363048]
We propose a novel Unified and Representation Learning for Incomplete Multi-View Clustering (URRL-IMVC)
URRL-IMVC directly learns a unified embedding that is robust to view missing conditions by integrating information from multiple views and neighboring samples.
We extensively evaluate the proposed URRL-IMVC framework on various benchmark datasets, demonstrating its state-of-the-art performance.
arXiv Detail & Related papers (2024-07-12T09:35:25Z) - Hierarchical Mutual Information Analysis: Towards Multi-view Clustering
in The Wild [9.380271109354474]
This work proposes a deep MVC framework where data recovery and alignment are fused in a hierarchically consistent way to maximize the mutual information among different views.
To the best of our knowledge, this could be the first successful attempt to handle the missing and unaligned data problem separately with different learning paradigms.
arXiv Detail & Related papers (2023-10-28T06:43:57Z) - M$^3$Net: Multi-view Encoding, Matching, and Fusion for Few-shot
Fine-grained Action Recognition [80.21796574234287]
M$3$Net is a matching-based framework for few-shot fine-grained (FS-FG) action recognition.
It incorporates textitmulti-view encoding, textitmulti-view matching, and textitmulti-view fusion to facilitate embedding encoding, similarity matching, and decision making.
Explainable visualizations and experimental results demonstrate the superiority of M$3$Net in capturing fine-grained action details.
arXiv Detail & Related papers (2023-08-06T09:15:14Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Multi-view Multi-behavior Contrastive Learning in Recommendation [52.42597422620091]
Multi-behavior recommendation (MBR) aims to jointly consider multiple behaviors to improve the target behavior's performance.
We propose a novel Multi-behavior Multi-view Contrastive Learning Recommendation framework.
arXiv Detail & Related papers (2022-03-20T15:13:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.