Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
- URL: http://arxiv.org/abs/2305.16166v1
- Date: Thu, 25 May 2023 15:26:13 GMT
- Title: Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
- Authors: Xuming Hu, Zhijiang Guo, Zhiyang Teng, Irwin King, Philip S. Yu
- Abstract summary: This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
- Score: 89.04041100520881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal relation extraction (MRE) is the task of identifying the semantic
relationships between two entities based on the context of the sentence image
pair. Existing retrieval-augmented approaches mainly focused on modeling the
retrieved textual knowledge, but this may not be able to accurately identify
complex relations. To improve the prediction, this research proposes to
retrieve textual and visual evidence based on the object, sentence, and whole
image. We further develop a novel approach to synthesize the object-level,
image-level, and sentence-level information for better reasoning between the
same and different modalities. Extensive experiments and analyses show that the
proposed method is able to effectively select and compare evidence across
modalities and significantly outperforms state-of-the-art models.
Related papers
- Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction [13.454953507205278]
Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
arXiv Detail & Related papers (2023-06-19T15:31:34Z) - Leveraging Knowledge Graph Embeddings to Enhance Contextual
Representations for Relation Extraction [0.0]
We propose a relation extraction approach based on the incorporation of pretrained knowledge graph embeddings at the corpus scale into the sentence-level contextual representation.
We conducted a series of experiments which revealed promising and very interesting results for our proposed approach.
arXiv Detail & Related papers (2023-06-07T07:15:20Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Learning Relation Alignment for Calibrated Cross-modal Retrieval [52.760541762871505]
We propose a novel metric, Intra-modal Self-attention Distance (ISD), to quantify the relation consistency by measuring the semantic distance between linguistic and visual relations.
We present Inter-modal Alignment on Intra-modal Self-attentions (IAIS), a regularized training method to optimize the ISD and calibrate intra-modal self-attentions mutually via inter-modal alignment.
arXiv Detail & Related papers (2021-05-28T14:25:49Z) - Reasoning with Latent Structure Refinement for Document-Level Relation
Extraction [20.308845516900426]
We propose a novel model that empowers the relational reasoning across sentences by automatically inducing the latent document-level graph.
Specifically, our model achieves an F1 score of 59.05 on a large-scale document-level dataset (DocRED)
arXiv Detail & Related papers (2020-05-13T13:36:09Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.