Related papers: Progressively Modality Freezing for Multi-Modal Entity Alignment

Progressively Modality Freezing for Multi-Modal Entity Alignment

URL: http://arxiv.org/abs/2407.16168v1
Date: Tue, 23 Jul 2024 04:22:30 GMT
Title: Progressively Modality Freezing for Multi-Modal Entity Alignment
Authors: Yani Huang, Xuefeng Zhang, Richong Zhang, Junfan Chen, Jaein Kim,
Abstract summary: We propose a novel strategy of progressive modality freezing, called PMF, that focuses on alignmentrelevant features. Notably, our approach introduces a pioneering cross-modal association loss to foster modal consistency. Empirical evaluations across nine datasets confirm PMF's superiority.
Score: 27.77877721548588
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Modal Entity Alignment aims to discover identical entities across heterogeneous knowledge graphs. While recent studies have delved into fusion paradigms to represent entities holistically, the elimination of features irrelevant to alignment and modal inconsistencies is overlooked, which are caused by inherent differences in multi-modal features. To address these challenges, we propose a novel strategy of progressive modality freezing, called PMF, that focuses on alignmentrelevant features and enhances multi-modal feature fusion. Notably, our approach introduces a pioneering cross-modal association loss to foster modal consistency. Empirical evaluations across nine datasets confirm PMF's superiority, demonstrating stateof-the-art performance and the rationale for freezing modalities. Our code is available at https://github.com/ninibymilk/PMF-MMEA.

Related papers

UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation [104.59740403500132]
Multi-modal image segmentation faces real-world deployment challenges from incomplete/corrupted modalities degrading performance.<n>We propose a unified modality-relax segmentation network (UniMRSeg) through hierarchical self-supervised compensation (HSSC)<n>Our approach hierarchically bridges representation gaps between complete and incomplete modalities across input, feature and output levels.
arXiv Detail & Related papers (2025-09-19T17:29:25Z)
Learning to Fuse: Modality-Aware Adaptive Scheduling for Robust Multimodal Foundation Models [0.0]
Modality-Aware Adaptive Fusion Scheduling (MA-AFS) learns to dynamically modulate the contribution of each modality on a per-instance basis.<n>Our work highlights the importance of adaptive fusion and opens a promising direction toward reliable and uncertainty-aware multimodal learning.
arXiv Detail & Related papers (2025-06-15T05:57:45Z)
FedRecon: Missing Modality Reconstruction in Distributed Heterogeneous Environments [7.646878242748392]
FedRecon is first method targeting simultaneous missing modality reconstruction and Non-IID adaptation in multimodal learning. Our approach first employs a lightweight Multimodal Variational Autoencoder (MVAE) to reconstruct missing modalities. We introduce a strategy employing global generator freezing to prevent catastrophic forgetting, which in turn mitigates Non-IID fluctuations.
arXiv Detail & Related papers (2025-04-14T07:04:10Z)
DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning [7.947217265041953]
Multimodal representation learning aims to capture both shared and complementary semantic information across multiple modalities. We introduce DecAlign, a novel hierarchical cross-modal alignment framework designed to decouple multimodal representations into modality-unique (heterogeneous) and modality-common (homogeneous) features. Our experiments on four widely used multimodal benchmarks demonstrate that DecAlign consistently outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-03-14T21:47:48Z)
Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations [19.731611716111566]
We propose a Multimodal fusion approach for learning modality-Exclusive and modality-Agnostic representations. We introduce a predictive self-attention module to capture reliable context dynamics within modalities. A hierarchical cross-modal attention module is designed to explore valuable element correlations among modalities. A double-discriminator strategy is presented to ensure the production of distinct representations in an adversarial manner.
arXiv Detail & Related papers (2024-07-06T04:36:48Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities. We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z)
Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment [38.574204922793626]
We present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. We introduce UMAEA, a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities.
arXiv Detail & Related papers (2023-07-30T12:16:49Z)
Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process. Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z)
Provable Dynamic Fusion for Low-Quality Multimodal Data [94.39538027450948]
Dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. A novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness.
arXiv Detail & Related papers (2023-06-03T08:32:35Z)
IMF: Interactive Multimodal Fusion Model for Link Prediction [13.766345726697404]
We introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets.
arXiv Detail & Related papers (2023-03-20T01:20:02Z)
MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid [40.745848169903105]
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs. MMEA algorithms rely on KG-level modality fusion strategies for multi-modal entity representation. This paper introduces MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid.
arXiv Detail & Related papers (2022-12-29T20:49:58Z)
Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN) We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.