Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
- URL: http://arxiv.org/abs/2503.04151v1
- Date: Thu, 06 Mar 2025 07:01:08 GMT
- Title: Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
- Authors: Jie Xu, Na Zhao, Gang Niu, Masashi Sugiyama, Xiaofeng Zhu,
- Abstract summary: Real-world multi-view datasets are often heterogeneous and imperfect.<n>We propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment.<n>In experiments, we employ it in unsupervised multi-view clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval.
- Score: 61.64052577026623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, multi-view learning (MVL) has garnered significant attention due to its ability to fuse discriminative information from multiple views. However, real-world multi-view datasets are often heterogeneous and imperfect, which usually makes MVL methods designed for specific combinations of views lack application potential and limits their effectiveness. To address this issue, we propose a novel robust MVL method (namely RML) with simultaneous representation fusion and alignment. Specifically, we introduce a simple yet effective multi-view transformer fusion network where we transform heterogeneous multi-view data into homogeneous word embeddings, and then integrate multiple views by the sample-level attention mechanism to obtain a fused representation. Furthermore, we propose a simulated perturbation based multi-view contrastive learning framework that dynamically generates the noise and unusable perturbations for simulating imperfect data conditions. The simulated noisy and unusable data obtain two distinct fused representations, and we utilize contrastive learning to align them for learning discriminative and robust representations. Our RML is self-supervised and can also be applied for downstream tasks as a regularization. In experiments, we employ it in unsupervised multi-view clustering, noise-label classification, and as a plug-and-play module for cross-modal hashing retrieval. Extensive comparison experiments and ablation studies validate the effectiveness of RML.
Related papers
- COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking [52.62149024881728]
We propose a contrastive one-stage transformer fusion framework for vision-language (VL) tracking.
We introduce a contrastive alignment strategy that maximizes mutual information between a video and its corresponding language description.
By leveraging a visual-linguistic transformer, we establish an efficient multi-modal fusion and reasoning mechanism.
arXiv Detail & Related papers (2025-04-02T03:12:38Z) - Incomplete Multi-view Clustering via Diffusion Contrastive Generation [10.303281347345955]
We propose a novel IMVC method called Diffusion Contrastive Generation (DCG)
DCG learns the distribution characteristics to enhance clustering by applying forward diffusion and reverse denoising processes to intra-view data.
It integrates instance-level and category-level interactive learning to exploit the consistent and complementary information available in multi-view data.
arXiv Detail & Related papers (2025-03-12T09:27:25Z) - Generalizable and Robust Spectral Method for Multi-view Representation Learning [9.393841121141076]
Multi-view representation learning (MvRL) has garnered substantial attention in recent years.<n> graph Laplacian-based MvRL methods have demonstrated remarkable success in representing multi-view data.<n>We introduce $textitSpecRaGE$, a novel fusion-based framework that integrates the strengths of graph Laplacian methods with the power of deep learning.
arXiv Detail & Related papers (2024-11-04T14:51:35Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - A Framework for Fine-Tuning LLMs using Heterogeneous Feedback [69.51729152929413]
We present a framework for fine-tuning large language models (LLMs) using heterogeneous feedback.
First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF.
Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases.
arXiv Detail & Related papers (2024-08-05T23:20:32Z) - Hierarchical Mutual Information Analysis: Towards Multi-view Clustering
in The Wild [9.380271109354474]
This work proposes a deep MVC framework where data recovery and alignment are fused in a hierarchically consistent way to maximize the mutual information among different views.
To the best of our knowledge, this could be the first successful attempt to handle the missing and unaligned data problem separately with different learning paradigms.
arXiv Detail & Related papers (2023-10-28T06:43:57Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Deep Incomplete Multi-view Clustering with Cross-view Partial Sample and
Prototype Alignment [50.82982601256481]
We propose a Cross-view Partial Sample and Prototype Alignment Network (CPSPAN) for Deep Incomplete Multi-view Clustering.
Unlike existing contrastive-based methods, we adopt pair-observed data alignment as 'proxy supervised signals' to guide instance-to-instance correspondence construction.
arXiv Detail & Related papers (2023-03-28T02:31:57Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - MORI-RAN: Multi-view Robust Representation Learning via Hybrid
Contrastive Fusion [4.36488705757229]
Multi-view representation learning is essential for many multi-view tasks, such as clustering and classification.
We propose a hybrid contrastive fusion algorithm to extract robust view-common representation from unlabeled data.
Experimental results demonstrated that the proposed method outperforms 12 competitive multi-view methods on four real-world datasets.
arXiv Detail & Related papers (2022-08-26T09:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.