Related papers: Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning

Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning

URL: http://arxiv.org/abs/2307.03591v1
Date: Thu, 6 Jul 2023 16:04:56 GMT
Title: Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning
Authors: Ke Liang, Sihang Zhou, Yue Liu, Lingyuan Meng, Meng Liu, Xinwang Liu
Abstract summary: We propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.
Score: 41.691551152718745
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal knowledge graphs (MKGs), which intuitively organize information in various modalities, can benefit multiple practical downstream tasks, such as recommendation systems, and visual question answering. However, most MKGs are still far from complete, which motivates the flourishing of MKG reasoning models. Recently, with the development of general artificial architectures, the pretrained transformer models have drawn increasing attention, especially for multimodal scenarios. However, the research of multimodal pretrained transformer (MPT) for knowledge graph reasoning (KGR) is still at an early stage. As the biggest difference between MKG and other multimodal data, the rich structural information underlying the MKG still cannot be fully leveraged in existing MPT models. Most of them only utilize the graph structure as a retrieval map for matching images and texts connected with the same entity. This manner hinders their reasoning performances. To this end, we propose the graph Structure Guided Multimodal Pretrained Transformer for knowledge graph reasoning, termed SGMPT. Specifically, the graph structure encoder is adopted for structural feature encoding. Then, a structure-guided fusion module with two different strategies, i.e., weighted summation and alignment constraint, is first designed to inject the structural information into both the textual and visual features. To the best of our knowledge, SGMPT is the first MPT model for multimodal KGR, which mines the structural information underlying the knowledge graph. Extensive experiments on FB15k-237-IMG and WN18-IMG, demonstrate that our SGMPT outperforms existing state-of-the-art models, and prove the effectiveness of the designed strategies.

Related papers

VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings [14.36934698922473]
Vision-Language Models (VLMs) offer a powerful way to align diverse modalities within a shared embedding space.<n>We propose Vision-Language Knowledge Graph Embeddings (VL-KGE), a framework that integrates cross-modal alignment from VLMs with structured relational modeling.
arXiv Detail & Related papers (2026-03-02T22:18:48Z)
Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens [60.15844119489298]
Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents.<n>We propose a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs.<n> Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines.
arXiv Detail & Related papers (2026-02-11T13:32:09Z)
Graph4MM: Weaving Multimodal Learning with Structural Information [52.16646463590474]
Graphs provide powerful structural information for modeling intra- and inter-modal relationships.<n>Previous works fail to distinguish multi-hop neighbors and treat the graph as a standalone modality.<n>We propose Graph4MM, a graph-based multimodal learning framework.
arXiv Detail & Related papers (2025-10-19T20:13:03Z)
DiffusionCom: Structure-Aware Multimodal Diffusion Model for Multimodal Knowledge Graph Completion [15.898786167134997]
We propose a structure-aware multimodal Diffusion model for multimodal knowledge graph Completion (DiffusionCom) DiffusionCom is trained using both generative and discriminative losses for the generator, while the feature extractor is optimized exclusively with discriminative loss. Experiments on the FB15k-237-IMG and WN18-IMG datasets demonstrate that DiffusionCom outperforms state-of-the-art models.
arXiv Detail & Related papers (2025-04-09T02:50:37Z)
Transformer-Based Multimodal Knowledge Graph Completion with Link-Aware Contexts [3.531533402602335]
Multimodal knowledge graph completion (MMKGC) aims to predict missing links in multimodal knowledge graphs (MMKGs) Existing MMKGC approaches primarily extend traditional knowledge graph embedding (KGE) models. We propose a novel approach that integrates Transformer-based KGE models with cross-modal context generated by pre-trained VLMs.
arXiv Detail & Related papers (2025-01-26T22:23:14Z)
Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing. The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge. It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z)
MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion [51.80447197290866]
We introduce MyGO to process, fuse, and augment the fine-grained modality information from MMKGs. MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models.
arXiv Detail & Related papers (2024-04-15T05:40:41Z)
Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework. We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking. Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z)
Contextualized Structural Self-supervised Learning for Ontology Matching [0.9402105308876642]
We introduce a novel self-supervised learning framework called LaKERMap. LaKERMap capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time.
arXiv Detail & Related papers (2023-10-05T18:51:33Z)
Pre-training Transformers for Knowledge Graph Completion [81.4078733132239]
We introduce a novel inductive KG representation model (iHT) for learning transferable representation for knowledge graphs. iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers. Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models.
arXiv Detail & Related papers (2023-03-28T02:10:37Z)
IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling [3.867363075280544]
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. New model is developed, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM) Model achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
arXiv Detail & Related papers (2023-01-06T10:08:11Z)
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal [57.8455911689554]
Knowledge graph reasoning (KGR) aims to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs) It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering, recommendation systems, and etc.
arXiv Detail & Related papers (2022-12-12T08:40:04Z)
Knowledge Graph Completion with Pre-trained Multimodal Transformer and Twins Negative Sampling [13.016173217017597]
We propose a VisualBERT-enhanced Knowledge Graph Completion model (VBKGC) for short. VBKGC could capture deeply fused multimodal information for entities and integrate them into the KGC model. We conduct extensive experiments to show the outstanding performance of VBKGC on the link prediction task.
arXiv Detail & Related papers (2022-09-15T06:50:31Z)
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge. MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z)
Multi-modal Entity Alignment in Hyperbolic Space [13.789898717291251]
We propose a novel multi-modal entity alignment approach, Hyperbolic multi-modal entity alignment(HMEA) We first adopt the Hyperbolic Graph Convolutional Networks (HGCNs) to learn structural representations of entities. We then combine the structure and visual representations in the hyperbolic space and use the aggregated embeddings to predict potential alignment results.
arXiv Detail & Related papers (2021-06-07T13:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.