Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling
- URL: http://arxiv.org/abs/2305.11719v2
- Date: Thu, 25 May 2023 04:08:21 GMT
- Title: Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling
- Authors: Shengqiong Wu, Hao Fei, Yixin Cao, Lidong Bing, Tat-Seng Chua
- Abstract summary: Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
- Score: 96.75821232222201
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing research on multimodal relation extraction (MRE) faces two
co-existing challenges, internal-information over-utilization and
external-information under-exploitation. To combat that, we propose a novel
framework that simultaneously implements the idea of internal-information
screening and external-information exploiting. First, we represent the
fine-grained semantic structures of the input image and text with the visual
and textual scene graphs, which are further fused into a unified cross-modal
graph (CMG). Based on CMG, we perform structure refinement with the guidance of
the graph information bottleneck principle, actively denoising the
less-informative features. Next, we perform topic modeling over the input image
and text, incorporating latent multimodal topic features to enrich the
contexts. On the benchmark MRE dataset, our system outperforms the current best
model significantly. With further in-depth analyses, we reveal the great
potential of our method for the MRE task. Our codes are open at
https://github.com/ChocoWu/MRE-ISE.
Related papers
- Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model [16.03304915788997]
Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts.
Existing methods for JMERE require large amounts of labeled data.
We introduce the textbfKnowledge-textbfEnhanced textbfCross-modal textbfPrompt textbfModel.
arXiv Detail & Related papers (2024-10-18T07:14:54Z) - Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion [51.80447197290866]
We introduce MyGO to process, fuse, and augment the fine-grained modality information from MMKGs.
MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder.
Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models.
arXiv Detail & Related papers (2024-04-15T05:40:41Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Multi-source Semantic Graph-based Multimodal Sarcasm Explanation
Generation [53.97962603641629]
We propose a novel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme, named TEAM.
TEAM extracts the object-level semantic meta-data instead of the traditional global visual features from the input image.
TEAM introduces a multi-source semantic graph that comprehensively characterize the multi-source semantic relations.
arXiv Detail & Related papers (2023-06-29T03:26:10Z) - Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction [13.454953507205278]
Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
arXiv Detail & Related papers (2023-06-19T15:31:34Z) - Enhancing Multimodal Entity and Relation Extraction with Variational
Information Bottleneck [12.957002659910456]
We study the multimodal named entity recognition (MNER) and multimodal relation extraction (MRE)
The core of MNER and MRE lies in incorporating evident visual information to enhance textual semantics.
We propose a novel method for MNER and MRE by Multi-Modal representation learning with Information Bottleneck (MMIB)
arXiv Detail & Related papers (2023-04-05T09:32:25Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.