Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph
Alignment Network and Word-pair Relation Tagging
- URL: http://arxiv.org/abs/2211.15028v1
- Date: Mon, 28 Nov 2022 03:23:54 GMT
- Title: Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph
Alignment Network and Word-pair Relation Tagging
- Authors: Li Yuan, Yi Cai, Jin Wang, Qing Li
- Abstract summary: This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task.
The proposed method can leverage the edge information to auxiliary alignment between objects and entities.
- Score: 19.872199943795195
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal named entity recognition (MNER) and multimodal relation extraction
(MRE) are two fundamental subtasks in the multimodal knowledge graph
construction task. However, the existing methods usually handle two tasks
independently, which ignores the bidirectional interaction between them. This
paper is the first to propose jointly performing MNER and MRE as a joint
multimodal entity-relation extraction task (JMERE). Besides, the current MNER
and MRE models only consider aligning the visual objects with textual entities
in visual and textual graphs but ignore the entity-entity relationships and
object-object relationships. To address the above challenges, we propose an
edge-enhanced graph alignment network and a word-pair relation tagging (EEGA)
for JMERE task. Specifically, we first design a word-pair relation tagging to
exploit the bidirectional interaction between MNER and MRE and avoid the error
propagation. Then, we propose an edge-enhanced graph alignment network to
enhance the JMERE task by aligning nodes and edges in the cross-graph. Compared
with previous methods, the proposed method can leverage the edge information to
auxiliary alignment between objects and entities and find the correlations
between entity-entity relationships and object-object relationships.
Experiments are conducted to show the effectiveness of our model.
Related papers
- DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking [31.15972952813689]
We propose a novel framework called Dynamic Relation Interactive Network (DRIN) for MEL tasks.
DRIN explicitly models four different types of alignment between a mention and entity and builds a dynamic Graph Convolutional Network (GCN) to dynamically select the corresponding alignment relations for different input samples.
Experiments on two datasets show that DRIN outperforms state-of-the-art methods by a large margin, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2023-10-09T10:21:42Z) - CARE: Co-Attention Network for Joint Entity and Relation Extraction [0.0]
We propose a Co-Attention network for joint entity and relation extraction.
Our approach includes adopting a parallel encoding strategy to learn separate representations for each subtask.
At the core of our approach is the co-attention module that captures two-way interaction between the two subtasks.
arXiv Detail & Related papers (2023-08-24T03:40:54Z) - Learning Implicit Entity-object Relations by Bidirectional Generative
Alignment for Multimodal NER [43.425998295991135]
We propose a bidirectional generative alignment method named BGA-MNER to tackle these issues.
Our method achieves state-of-the-art performance without image input during inference.
arXiv Detail & Related papers (2023-08-03T10:37:20Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction [13.454953507205278]
Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
arXiv Detail & Related papers (2023-06-19T15:31:34Z) - From Alignment to Entailment: A Unified Textual Entailment Framework for
Entity Alignment [17.70562397382911]
Existing methods usually encode the triples of entities as embeddings and learn to align the embeddings.
We transform both triples into unified textual sequences, and model the EA task as a bi-directional textual entailment task.
Our approach captures the unified correlation pattern of two kinds of information between entities, and explicitly models the fine-grained interaction between original entity information.
arXiv Detail & Related papers (2023-05-19T08:06:50Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - CoADNet: Collaborative Aggregation-and-Distribution Networks for
Co-Salient Object Detection [91.91911418421086]
Co-Salient Object Detection (CoSOD) aims at discovering salient objects that repeatedly appear in a given query group containing two or more relevant images.
One challenging issue is how to effectively capture co-saliency cues by modeling and exploiting inter-image relationships.
We present an end-to-end collaborative aggregation-and-distribution network (CoADNet) to capture both salient and repetitive visual patterns from multiple images.
arXiv Detail & Related papers (2020-11-10T04:28:11Z) - HittER: Hierarchical Transformers for Knowledge Graph Embeddings [85.93509934018499]
We propose Hitt to learn representations of entities and relations in a complex knowledge graph.
Experimental results show that Hitt achieves new state-of-the-art results on multiple link prediction.
We additionally propose a simple approach to integrate Hitt into BERT and demonstrate its effectiveness on two Freebase factoid answering datasets.
arXiv Detail & Related papers (2020-08-28T18:58:15Z) - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.