Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
- URL: http://arxiv.org/abs/2405.16869v2
- Date: Wed, 02 Oct 2024 14:42:10 GMT
- Title: Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
- Authors: Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Binbin Hu, Ziqi Liu, Wen Zhang, Huajun Chen,
- Abstract summary: Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning.
Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies.
We introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal entity representations.
- Score: 51.80447197290866
- License:
- Abstract: Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning, which can enhance reasoning tasks within the MMKGs, such as MMKG completion (MMKGC). The main challenge is to collaboratively model the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies, yet they overlook the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK for short) to learn adaptive multi-modal entity representations for better MMKGC. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve joint decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MoMoK under complex scenarios.
Related papers
- MIND: Modality-Informed Knowledge Distillation Framework for Multimodal Clinical Prediction Tasks [50.98856172702256]
We propose the Modality-INformed knowledge Distillation (MIND) framework, a multimodal model compression approach.
MIND transfers knowledge from ensembles of pre-trained deep neural networks of varying sizes into a smaller multimodal student.
We evaluate MIND on binary and multilabel clinical prediction tasks using time series data and chest X-ray images.
arXiv Detail & Related papers (2025-02-03T08:50:00Z) - Multimodal Reasoning with Multimodal Knowledge Graph [19.899398342533722]
Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge.
We propose the Multimodal Reasoning with Multimodal Knowledge Graph (MR-MKG) method to learn rich and semantic knowledge across modalities.
arXiv Detail & Related papers (2024-06-04T07:13:23Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation [51.80447197290866]
Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs.
Existing MMKGC methods usually extract multi-modal features with pre-trained models.
We introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities.
arXiv Detail & Related papers (2024-04-15T05:40:41Z) - Zero-Shot Relational Learning for Multimodal Knowledge Graphs [31.215889061734295]
We propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator embedding generator, to integrate diverse multimodal information and knowledge graph structures.
Evaluation results on three multimodal knowledge graphs demonstrate the superior performance of our proposed method.
arXiv Detail & Related papers (2024-04-09T11:14:45Z) - Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework.
We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking.
Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.