Mixture of Modality Knowledge Experts for Robust Multi-modal Knowledge Graph Completion
- URL: http://arxiv.org/abs/2405.16869v1
- Date: Mon, 27 May 2024 06:36:17 GMT
- Title: Mixture of Modality Knowledge Experts for Robust Multi-modal Knowledge Graph Completion
- Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen,
- Abstract summary: Multi-modal knowledge graph completion (MMKGC) aims to automatically discover new knowledge triples in the given multi-modal knowledge graphs (MMKGs)
Existing methods tend to focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts.
We introduce a novel MMKGC framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal embedding under intricate relational contexts.
- Score: 51.80447197290866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal knowledge graph completion (MMKGC) aims to automatically discover new knowledge triples in the given multi-modal knowledge graphs (MMKGs), which is achieved by collaborative modeling the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods tend to focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel MMKGC framework with Mixture of Modality Knowledge experts (MoMoK for short) to learn adaptive multi-modal embedding under intricate relational contexts. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve comprehensive decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MoMoK under complex scenarios.
Related papers
- PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents [58.35492519636351]
PIN format is built on three foundational principles: knowledge intensity, scalability, and support for diverse training modalities.
We present PIN-14M, an open-source dataset comprising 14 million samples derived from a diverse range of Chinese and English sources.
arXiv Detail & Related papers (2024-06-20T01:43:08Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion [51.80447197290866]
We introduce MyGO to process, fuse, and augment the fine-grained modality information from MMKGs.
MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder.
Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models.
arXiv Detail & Related papers (2024-04-15T05:40:41Z) - Zero-Shot Relational Learning for Multimodal Knowledge Graphs [31.215889061734295]
One of the major challenges is inference on newly discovered relations without associated training data.
Existing works fail to support the leverage of multimodal information and leave the problem unexplored.
We propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator embedding generator, to integrate diverse multimodal information and knowledge graph structures.
arXiv Detail & Related papers (2024-04-09T11:14:45Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Unleashing the Power of Imbalanced Modality Information for Multi-modal
Knowledge Graph Completion [40.86196588992357]
Multi-modal knowledge graph completion (MMKGC) aims to predict the missing triples in the multi-modal knowledge graphs.
We propose Adaptive Multi-modal Fusion and Modality Adversarial Training (AdaMF-MAT) to unleash the power of imbalanced modality information.
Our approach is a co-design of the MMKGC model and training strategy which can outperform 19 recent MMKGC methods.
arXiv Detail & Related papers (2024-02-22T05:48:03Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition
with Auxiliary Refined Knowledge [27.152813529536424]
We present PGIM -- a two-stage framework that aims to leverage ChatGPT as an implicit knowledge base.
PGIM generates auxiliary knowledge for more efficient entity prediction.
It outperforms state-of-the-art methods on two classic MNER datasets.
arXiv Detail & Related papers (2023-05-20T15:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.