Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
- URL: http://arxiv.org/abs/2602.15896v1
- Date: Wed, 11 Feb 2026 13:32:09 GMT
- Title: Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens
- Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Wen Zhang, Huajun Chen,
- Abstract summary: Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents.<n>We propose a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs.<n> Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines.
- Score: 60.15844119489298
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and struggles to generalize to new KGs. Recent knowledge graph foundation models (KGFMs) improve cross-KG transfer, but they mainly exploit structural patterns and ignore rich multi-modal signals. We address these gaps by proposing a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs. TOFU discretizes structural, visual, and textual information into modality-specific tokens. TOFU then employs a hierarchical fusion architecture with mixture-of-message mechanisms, aiming to process these tokens and obtain transferable features for MMKGR. Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines, delivering strong performance on unseen MMKGs.
Related papers
- Graph4MM: Weaving Multimodal Learning with Structural Information [52.16646463590474]
Graphs provide powerful structural information for modeling intra- and inter-modal relationships.<n>Previous works fail to distinguish multi-hop neighbors and treat the graph as a standalone modality.<n>We propose Graph4MM, a graph-based multimodal learning framework.
arXiv Detail & Related papers (2025-10-19T20:13:03Z) - MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs [6.165053219836395]
We propose MMGraphRAG, which refines visual content through scene graphs and constructs a multimodal knowledge graph.<n>It employs spectral clustering to achieve cross-modal entity linking and retrieves context along reasoning paths to guide the generative process.<n> Experimental results show that MMGraphRAG achieves state-of-the-art performance on the DocBench and MMLongBench datasets.
arXiv Detail & Related papers (2025-07-28T13:16:23Z) - HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs [6.615362280237532]
Multimodal knowledge graphs (MMKGs) enrich traditional knowledge graphs (KGs) by incorporating diverse modalities such as images and text.<n> multimodal knowledge graph completion (MMKGC) seeks to exploit these heterogeneous signals to infer missing facts.<n> HERGC is a flexible Heterogeneous Experts Representation and Generative Completion framework for MMKGs.
arXiv Detail & Related papers (2025-06-01T04:12:25Z) - Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains [66.55612528039894]
Knowledge Graphs (KGs) can serve as reliable knowledge sources for question answering (QA)
We present DoG (Decoding on Graphs), a novel framework that facilitates a deep synergy between LLMs and KGs.
Experiments across various KGQA tasks with different background KGs demonstrate that DoG achieves superior and robust performance.
arXiv Detail & Related papers (2024-10-24T04:01:40Z) - Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation [51.80447197290866]
Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs.<n>Existing MMKGC methods usually extract multi-modal features with pre-trained models.<n>We introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities.
arXiv Detail & Related papers (2024-04-15T05:40:41Z) - Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework.<n>We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.<n>Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - MACO: A Modality Adversarial and Contrastive Framework for
Modality-missing Multi-modal Knowledge Graph Completion [18.188971531961663]
We propose a modality adversarial and contrastive framework (MACO) to solve the modality-missing problem in MMKGC.
MACO trains a generator and discriminator adversarially to generate missing modality features that can be incorporated into the MMKGC model.
arXiv Detail & Related papers (2023-08-13T06:29:38Z) - Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph
Reasoning [41.691551152718745]
We propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT.
To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph.
Our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.
arXiv Detail & Related papers (2023-07-06T16:04:56Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.