Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction
- URL: http://arxiv.org/abs/2306.11020v1
- Date: Mon, 19 Jun 2023 15:31:34 GMT
- Title: Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction
- Authors: Qian Li, Shu Guo, Cheng Ji, Xutan Peng, Shiyao Cui, and Jianxin Li
- Abstract summary: Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
- Score: 13.454953507205278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Modal Relation Extraction (MMRE) aims at identifying the relation
between two entities in texts that contain visual clues. Rich visual content is
valuable for the MMRE task, but existing works cannot well model finer
associations among different modalities, failing to capture the truly helpful
visual information and thus limiting relation extraction performance. In this
paper, we propose a novel MMRE framework to better capture the deeper
correlations of text, entity pair, and image/objects, so as to mine more
helpful information for the task, termed as DGF-PT. We first propose a
prompt-based autoregressive encoder, which builds the associations of
intra-modal and inter-modal features related to the task, respectively by
entity-oriented and object-oriented prefixes. To better integrate helpful
visual information, we design a dual-gated fusion module to distinguish the
importance of image/objects and further enrich text representations. In
addition, a generative decoder is introduced with entity type restriction on
relations, better filtering out candidates. Extensive experiments conducted on
the benchmark dataset show that our approach achieves excellent performance
compared to strong competitors, even in the few-shot situation.
Related papers
- Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking [31.15972952813689]
We propose a novel framework called Dynamic Relation Interactive Network (DRIN) for MEL tasks.
DRIN explicitly models four different types of alignment between a mention and entity and builds a dynamic Graph Convolutional Network (GCN) to dynamically select the corresponding alignment relations for different input samples.
Experiments on two datasets show that DRIN outperforms state-of-the-art methods by a large margin, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2023-10-09T10:21:42Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Good Visual Guidance Makes A Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction [88.6585431949086]
We propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction.
We regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision.
Experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-05-07T02:10:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.