Related papers: Information Fusion: Scaling Subspace-Driven Approaches

Related papers

Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence [83.15764564701706]
We propose a novel framework that performs distributional vision-language alignment by integrating Cauchy-Schwarz divergence with mutual information. In the proposed framework, we find that the CS divergence and mutual information serve complementary roles in multimodal alignment, capturing both the global distribution information of each modality and the pairwise semantic relationships. Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment.
arXiv Detail & Related papers (2025-02-24T10:29:15Z)
Deep Modularity Networks with Diversity--Preserving Regularization [4.659251704980846]
We propose Deep Modularity Networks with Diversity-Preserving Regularization (DMoN-DPR), which introduces three novel regularization terms: distance-based for inter-cluster separation, variance-based for intra-cluster diversity, and entropy-based for balanced assignments. Our method enhances clustering performance on benchmark datasets, achieving significant improvements in Normalized Mutual Information (NMI), and F1 scores. These results demonstrate the effectiveness of incorporating diversity-preserving regularizations in creating meaningful and interpretable clusters, especially in feature-rich datasets.
arXiv Detail & Related papers (2025-01-23T08:05:59Z)
Structure-guided Deep Multi-View Clustering [13.593229506936682]
Deep multi-view clustering seeks to utilize the abundant information from multiple views to improve clustering performance. Most of the existing clustering methods often neglect to fully mine multi-view structural information. We propose a structure-guided deep multi-view clustering model to explore the distribution of multi-view data.
arXiv Detail & Related papers (2025-01-17T12:42:30Z)
FedFusion: Manifold Driven Federated Learning for Multi-satellite and Multi-modality Fusion [30.909597853659506]
This paper proposes a manifold-driven multi-modality fusion framework, FedFusion, which randomly samples local data on each client to jointly estimate the prominent manifold structure of shallow features of each client. Considering the physical space limitations of the satellite constellation, we developed a multimodal federated learning module designed specifically for manifold data in a deep latent space. The proposed framework surpasses existing methods in terms of performance on three multimodal datasets, achieving a classification average accuracy of 94.35$%$ while compressing communication costs by a factor of 4.
arXiv Detail & Related papers (2023-11-16T03:29:19Z)
Preserving Modality Structure Improves Multi-Modal Learning [64.10085674834252]
Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings without relying on human annotations. These methods often struggle to generalize well on out-of-domain data as they ignore the semantic structure present in modality-specific embeddings. We propose a novel Semantic-Structure-Preserving Consistency approach to improve generalizability by preserving the modality-specific relationships in the joint embedding space.
arXiv Detail & Related papers (2023-08-24T20:46:48Z)
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds. We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z)
Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries. Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples. We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z)
Subspace-Contrastive Multi-View Clustering [0.0]
We propose a novel Subspace-Contrastive Multi-View Clustering (SCMC) approach. We employ view-specific auto-encoders to map the original multi-view data into compact features perceiving its nonlinear structures. To demonstrate the effectiveness of the proposed model, we conduct a large number of comparative experiments on eight challenge datasets.
arXiv Detail & Related papers (2022-10-13T07:19:37Z)
Consistency and Diversity induced Human Motion Segmentation [231.36289425663702]
We propose a novel Consistency and Diversity induced human Motion (CDMS) algorithm. Our model factorizes the source and target data into distinct multi-layer feature spaces. A multi-mutual learning strategy is carried out to reduce the domain gap between the source and target data.
arXiv Detail & Related papers (2022-02-10T06:23:56Z)
A Multiscale Environment for Learning by Diffusion [9.619814126465206]
We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Diffusion (M-LUND) clustering algorithm.
arXiv Detail & Related papers (2021-01-31T17:46:19Z)
Deep Multimodal Fusion by Channel Exchanging [87.40768169300898]
This paper proposes a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network.
arXiv Detail & Related papers (2020-11-10T09:53:20Z)
Robust Group Subspace Recovery: A New Approach for Multi-Modality Data Fusion [18.202825916298437]
We propose a novel multi-modal data fusion approach based on group sparsity. The proposed approach exploits the structural dependencies between the different modalities data to cluster the associated target objects. The resulting UoS structure is employed to classify newly observed data points, highlighting the abstraction capacity of the proposed method.
arXiv Detail & Related papers (2020-06-18T16:31:31Z)
Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion [6.225190099424806]
Multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Most of the existing state-of-the-art focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance. Deep neural networks have exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data.
arXiv Detail & Related papers (2020-06-15T06:42:04Z)
Agglomerative Neural Networks for Multi-view Clustering [109.55325971050154]
We propose the agglomerative analysis to approximate the optimal consensus view. We present Agglomerative Neural Network (ANN) based on Constrained Laplacian Rank to cluster multi-view data directly. Our evaluations against several state-of-the-art multi-view clustering approaches on four popular datasets show the promising view-consensus analysis ability of ANN.
arXiv Detail & Related papers (2020-05-12T05:39:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.