Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition
with Auxiliary Refined Knowledge
- URL: http://arxiv.org/abs/2305.12212v2
- Date: Wed, 18 Oct 2023 17:05:58 GMT
- Title: Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition
with Auxiliary Refined Knowledge
- Authors: Jinyuan Li, Han Li, Zhuo Pan, Di Sun, Jiahao Wang, Wenkun Zhang, Gang
Pan
- Abstract summary: We present PGIM -- a two-stage framework that aims to leverage ChatGPT as an implicit knowledge base.
PGIM generates auxiliary knowledge for more efficient entity prediction.
It outperforms state-of-the-art methods on two classic MNER datasets.
- Score: 27.152813529536424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Named Entity Recognition (MNER) on social media aims to enhance
textual entity prediction by incorporating image-based clues. Existing studies
mainly focus on maximizing the utilization of pertinent image information or
incorporating external knowledge from explicit knowledge bases. However, these
methods either neglect the necessity of providing the model with external
knowledge, or encounter issues of high redundancy in the retrieved knowledge.
In this paper, we present PGIM -- a two-stage framework that aims to leverage
ChatGPT as an implicit knowledge base and enable it to heuristically generate
auxiliary knowledge for more efficient entity prediction. Specifically, PGIM
contains a Multimodal Similar Example Awareness module that selects suitable
examples from a small number of predefined artificial samples. These examples
are then integrated into a formatted prompt template tailored to the MNER and
guide ChatGPT to generate auxiliary refined knowledge. Finally, the acquired
knowledge is integrated with the original text and fed into a downstream model
for further processing. Extensive experiments show that PGIM outperforms
state-of-the-art methods on two classic MNER datasets and exhibits a stronger
robustness and generalization capability.
Related papers
- RoRA-VLM: Robust Retrieval-Augmented Vision Language Models [41.09545760534495]
RORA-VLM is a novel and robust retrieval augmentation framework specifically tailored for vision-language models.
We conduct extensive experiments to validate the effectiveness and robustness of our proposed methods on three widely adopted benchmark datasets.
arXiv Detail & Related papers (2024-10-11T14:51:00Z) - Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning [51.80447197290866]
Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning.
Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies.
We introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal entity representations.
arXiv Detail & Related papers (2024-05-27T06:36:17Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Dual-Gated Fusion with Prefix-Tuning for Multi-Modal Relation Extraction [13.454953507205278]
Multi-Modal Relation Extraction aims at identifying the relation between two entities in texts that contain visual clues.
We propose a novel MMRE framework to better capture the deeper correlations of text, entity pair, and image/objects.
Our approach achieves excellent performance compared to strong competitors, even in the few-shot situation.
arXiv Detail & Related papers (2023-06-19T15:31:34Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - KAT: A Knowledge Augmented Transformer for Vision-and-Language [56.716531169609915]
We propose a novel model - Knowledge Augmented Transformer (KAT) - which achieves a strong state-of-the-art result on the open-domain multimodal task of OK-VQA.
Our approach integrates implicit and explicit knowledge in an end to end encoder-decoder architecture, while still jointly reasoning over both knowledge sources during answer generation.
An additional benefit of explicit knowledge integration is seen in improved interpretability of model predictions in our analysis.
arXiv Detail & Related papers (2021-12-16T04:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.