DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
- URL: http://arxiv.org/abs/2310.05589v1
- Date: Mon, 9 Oct 2023 10:21:42 GMT
- Title: DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking
- Authors: Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, Xinyu Dai
- Abstract summary: We propose a novel framework called Dynamic Relation Interactive Network (DRIN) for MEL tasks.
DRIN explicitly models four different types of alignment between a mention and entity and builds a dynamic Graph Convolutional Network (GCN) to dynamically select the corresponding alignment relations for different input samples.
Experiments on two datasets show that DRIN outperforms state-of-the-art methods by a large margin, demonstrating the effectiveness of our approach.
- Score: 31.15972952813689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Entity Linking (MEL) is a task that aims to link ambiguous
mentions within multimodal contexts to referential entities in a multimodal
knowledge base. Recent methods for MEL adopt a common framework: they first
interact and fuse the text and image to obtain representations of the mention
and entity respectively, and then compute the similarity between them to
predict the correct entity. However, these methods still suffer from two
limitations: first, as they fuse the features of text and image before
matching, they cannot fully exploit the fine-grained alignment relations
between the mention and entity. Second, their alignment is static, leading to
low performance when dealing with complex and diverse data. To address these
issues, we propose a novel framework called Dynamic Relation Interactive
Network (DRIN) for MEL tasks. DRIN explicitly models four different types of
alignment between a mention and entity and builds a dynamic Graph Convolutional
Network (GCN) to dynamically select the corresponding alignment relations for
different input samples. Experiments on two datasets show that DRIN outperforms
state-of-the-art methods by a large margin, demonstrating the effectiveness of
our approach.
Related papers
- NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - A Dual-way Enhanced Framework from Text Matching Point of View for Multimodal Entity Linking [17.847936914174543]
Multimodal Entity Linking (MEL) aims at linking ambiguous mentions with multimodal information to entity in Knowledge Graph (KG) such as Wikipedia.
We formulate multimodal entity linking as a neural text matching problem where each multimodal information (text and image) is treated as a query.
This paper introduces a dual-way enhanced (DWE) framework for MEL.
arXiv Detail & Related papers (2023-12-19T03:15:50Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction [13.454953507205278]
Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
arXiv Detail & Related papers (2023-06-19T15:31:34Z) - From Alignment to Entailment: A Unified Textual Entailment Framework for
Entity Alignment [17.70562397382911]
Existing methods usually encode the triples of entities as embeddings and learn to align the embeddings.
We transform both triples into unified textual sequences, and model the EA task as a bi-directional textual entailment task.
Our approach captures the unified correlation pattern of two kinds of information between entities, and explicitly models the fine-grained interaction between original entity information.
arXiv Detail & Related papers (2023-05-19T08:06:50Z) - Learnable Pillar-based Re-ranking for Image-Text Retrieval [119.9979224297237]
Image-text retrieval aims to bridge the modality gap and retrieve cross-modal content based on semantic similarities.
Re-ranking, a popular post-processing practice, has revealed the superiority of capturing neighbor relations in single-modality retrieval tasks.
We propose a novel learnable pillar-based re-ranking paradigm for image-text retrieval.
arXiv Detail & Related papers (2023-04-25T04:33:27Z) - Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph
Alignment Network and Word-pair Relation Tagging [19.872199943795195]
This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task.
The proposed method can leverage the edge information to auxiliary alignment between objects and entities.
arXiv Detail & Related papers (2022-11-28T03:23:54Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Dynamic Relation Discovery and Utilization in Multi-Entity Time Series
Forecasting [92.32415130188046]
In many real-world scenarios, there could exist crucial yet implicit relation between entities.
We propose an attentional multi-graph neural network with automatic graph learning (A2GNN) in this work.
arXiv Detail & Related papers (2022-02-18T11:37:04Z) - EchoEA: Echo Information between Entities and Relations for Entity
Alignment [1.1470070927586016]
We propose a novel framework, Echo Entity Alignment (EchoEA), which leverages self-attention mechanism to spread entity information to relations and echo back to entities.
The experimental results on three real-world cross-lingual datasets are stable at around 96% at hits@1 on average.
arXiv Detail & Related papers (2021-07-07T07:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.