FusionAdapter for Few-Shot Relation Learning in Multimodal Knowledge Graphs
- URL: http://arxiv.org/abs/2510.00894v1
- Date: Wed, 01 Oct 2025 13:36:56 GMT
- Title: FusionAdapter for Few-Shot Relation Learning in Multimodal Knowledge Graphs
- Authors: Ran Liu, Yuan Fang, Xiaoli Li,
- Abstract summary: Multimodal Knowledge Graphs (MMKGs) incorporate various modalities, including text and images, to enhance entity and relation representations.<n>We propose FusionAdapter for the learning of few-shot relationships in MMKGs.<n>By effectively adapting and fusing information from diverse modalities, FusionAdapter improves generalization to novel relations with minimal supervision.
- Score: 15.157416257109809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Knowledge Graphs (MMKGs) incorporate various modalities, including text and images, to enhance entity and relation representations. Notably, different modalities for the same entity often present complementary and diverse information. However, existing MMKG methods primarily align modalities into a shared space, which tends to overlook the distinct contributions of specific modalities, limiting their performance particularly in low-resource settings. To address this challenge, we propose FusionAdapter for the learning of few-shot relationships (FSRL) in MMKG. FusionAdapter introduces (1) an adapter module that enables efficient adaptation of each modality to unseen relations and (2) a fusion strategy that integrates multimodal entity representations while preserving diverse modality-specific characteristics. By effectively adapting and fusing information from diverse modalities, FusionAdapter improves generalization to novel relations with minimal supervision. Extensive experiments on two benchmark MMKG datasets demonstrate that FusionAdapter achieves superior performance over state-of-the-art methods.
Related papers
- BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
We reformulate multi-modal semantic segmentation as a mask-level classification task.<n>We propose BiXFormer, which integrates Unified Modality Matching (UMM) and Cross Modality Alignment (CMA)<n> Experiments on both synthetic and real-world multi-modal benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2025-06-04T08:04:58Z) - Gated Multimodal Graph Learning for Personalized Recommendation [9.466822984141086]
Multimodal recommendation has emerged as a promising solution to alleviate the cold-start and sparsity problems in collaborative filtering.<n>We propose RLMultimodalRec, a lightweight and modular recommendation framework that combines graph-based user modeling with adaptive multimodal item encoding.
arXiv Detail & Related papers (2025-05-30T16:57:17Z) - Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion [82.74585945197231]
Unified image fusion aims to integrate complementary information from multi-source images, enhancing image quality.<n>Existing general image fusion methods incorporate explicit task identification to enable adaptation to different fusion tasks.<n>We propose a novel unified image fusion framework named "TITA", which balances Task-invariant Interaction and Task-specific Adaptation.
arXiv Detail & Related papers (2025-04-07T15:08:35Z) - StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation [63.31007867379312]
We propose StitchFusion, a framework that integrates large-scale pre-trained models directly as encoders and feature fusers.<n>We introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information transfer during encoding.<n>Our model achieves state-of-the-art performance on four multi-modal segmentation datasets with minimal additional parameters.
arXiv Detail & Related papers (2024-08-02T15:41:16Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Task-Customized Mixture of Adapters for General Image Fusion [51.8742437521891]
General image fusion aims at integrating important information from multi-source images.
We propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model.
arXiv Detail & Related papers (2024-03-19T07:02:08Z) - MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality
Hybrid [40.745848169903105]
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs.
MMEA algorithms rely on KG-level modality fusion strategies for multi-modal entity representation.
This paper introduces MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid.
arXiv Detail & Related papers (2022-12-29T20:49:58Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.