Related papers: Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment

Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment

URL: http://arxiv.org/abs/2310.06365v1
Date: Tue, 10 Oct 2023 07:06:06 GMT
Title: Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity Alignment
Authors: Qian Li, Cheng Ji, Shu Guo, Zhaoji Liang, Lihong Wang, Jianxin Li
Abstract summary: We propose a novel MMEA transformer, called MoAlign, that hierarchically introduces neighbor features, multi-modal attributes, and entity types. Taking advantage of the transformer's ability to better integrate multiple information, we design a hierarchical modifiable self-attention block in a transformer encoder. Our approach outperforms strong competitors and achieves excellent entity alignment performance.
Score: 17.592908862768425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Modal Entity Alignment (MMEA) is a critical task that aims to identify equivalent entity pairs across multi-modal knowledge graphs (MMKGs). However, this task faces challenges due to the presence of different types of information, including neighboring entities, multi-modal attributes, and entity types. Directly incorporating the above information (e.g., concatenation or attention) can lead to an unaligned information space. To address these challenges, we propose a novel MMEA transformer, called MoAlign, that hierarchically introduces neighbor features, multi-modal attributes, and entity types to enhance the alignment task. Taking advantage of the transformer's ability to better integrate multiple information, we design a hierarchical modifiable self-attention block in a transformer encoder to preserve the unique semantics of different information. Furthermore, we design two entity-type prefix injection methods to integrate entity-type information using type prefixes, which help to restrict the global information of entities not present in the MMKGs. Our extensive experiments on benchmark datasets demonstrate that our approach outperforms strong competitors and achieves excellent entity alignment performance.

Related papers

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images. In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS) Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z)
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment [17.570243718626994]
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs) We devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. We also propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations.
arXiv Detail & Related papers (2024-07-27T17:12:37Z)
Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation [51.80447197290866]
Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs. Existing MMKGC methods usually extract multi-modal features with pre-trained models. We introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities.
arXiv Detail & Related papers (2024-04-15T05:40:41Z)
NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild. NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities. We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z)
Noise-powered Multi-modal Knowledge Graph Representation Framework [52.95468915728721]
The rise of Multi-modal Pre-training highlights the necessity for a unified Multi-Modal Knowledge Graph representation learning framework. We propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking. Our approach achieves SOTA performance across a total of ten datasets, demonstrating its versatility.
arXiv Detail & Related papers (2024-03-11T15:48:43Z)
MMSFormer: Multimodal Transformer for Material and Semantic Segmentation [16.17270247327955]
We propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named Multi-Modal TransFormer (MMSFormer) that incorporates the proposed fusion strategy. MMSFormer outperforms current state-of-the-art models on three different datasets.
arXiv Detail & Related papers (2023-09-07T20:07:57Z)
MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities [25.059177235004952]
We propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities. A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks. The MESED dataset is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration.
arXiv Detail & Related papers (2023-07-27T14:09:59Z)
MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid [40.745848169903105]
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs. MMEA algorithms rely on KG-level modality fusion strategies for multi-modal entity representation. This paper introduces MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid.
arXiv Detail & Related papers (2022-12-29T20:49:58Z)
Transformer-based Entity Typing in Knowledge Graphs [17.134032162338833]
We propose a novel Transformer-based Entity Typing approach, effectively encoding the content of neighbors of an entity. Experiments on two real-world datasets demonstrate the superior performance of TET compared to the state-of-the-art.
arXiv Detail & Related papers (2022-10-20T10:40:25Z)
Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs. We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model. In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z)
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks. We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge. MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.