MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion
- URL: http://arxiv.org/abs/2404.09468v1
- Date: Mon, 15 Apr 2024 05:40:41 GMT
- Title: MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion
- Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Huajun Chen, Wen Zhang,
- Abstract summary: We introduce MyGO to process, fuse, and augment the fine-grained modality information from MMKGs.
MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder.
Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models.
- Score: 51.80447197290866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal knowledge graphs (MMKG) store structured world knowledge containing rich multi-modal descriptive information. To overcome their inherent incompleteness, multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given MMKGs, leveraging both structural information from the triples and multi-modal information of the entities. Existing MMKGC methods usually extract multi-modal features with pre-trained models and employ a fusion module to integrate multi-modal features with triple prediction. However, this often results in a coarse handling of multi-modal data, overlooking the nuanced, fine-grained semantic details and their interactions. To tackle this shortfall, we introduce a novel framework MyGO to process, fuse, and augment the fine-grained modality information from MMKGs. MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. To further augment the multi-modal representations, MyGO incorporates fine-grained contrastive learning to highlight the specificity of the entity representations. Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models, underlining its superior performance. Code and data are available at https://github.com/zjukg/MyGO
Related papers
- Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment [17.570243718626994]
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs)
We devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions.
We also propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations.
arXiv Detail & Related papers (2024-07-27T17:12:37Z) - Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning [51.80447197290866]
Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning.
Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies.
We introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal entity representations.
arXiv Detail & Related papers (2024-05-27T06:36:17Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework [46.69058301083775]
Multi-Modal Knowledge Graph (MMKG) representation learning framework is crucial for integrating structured knowledge into multi-modal Large Language Models (LLMs) at scale.
We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.
Our approach achieves SOTA performance across a total of ten datasets, demonstrating its robustness and versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - Unleashing the Power of Imbalanced Modality Information for Multi-modal
Knowledge Graph Completion [40.86196588992357]
Multi-modal knowledge graph completion (MMKGC) aims to predict the missing triples in the multi-modal knowledge graphs.
We propose Adaptive Multi-modal Fusion and Modality Adversarial Training (AdaMF-MAT) to unleash the power of imbalanced modality information.
Our approach is a co-design of the MMKGC model and training strategy which can outperform 19 recent MMKGC methods.
arXiv Detail & Related papers (2024-02-22T05:48:03Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.