MNER-QG: An End-to-End MRC framework for Multimodal Named Entity
Recognition with Query Grounding
- URL: http://arxiv.org/abs/2211.14739v1
- Date: Sun, 27 Nov 2022 06:10:03 GMT
- Title: MNER-QG: An End-to-End MRC framework for Multimodal Named Entity
Recognition with Query Grounding
- Authors: Meihuizi Jia, Lei Shen, Xin Shen, Lejian Liao, Meng Chen, Xiaodong He,
Zhendong Chen, Jiaqi Li
- Abstract summary: Multimodal named entity recognition (MNER) is a critical step in information extraction.
We propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based MRC-based named entity recognition and query grounding.
- Score: 21.49274082010887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal named entity recognition (MNER) is a critical step in information
extraction, which aims to detect entity spans and classify them to
corresponding entity types given a sentence-image pair. Existing methods either
(1) obtain named entities with coarse-grained visual clues from attention
mechanisms, or (2) first detect fine-grained visual regions with toolkits and
then recognize named entities. However, they suffer from improper alignment
between entity types and visual regions or error propagation in the two-stage
manner, which finally imports irrelevant visual information into texts. In this
paper, we propose a novel end-to-end framework named MNER-QG that can
simultaneously perform MRC-based multimodal named entity recognition and query
grounding. Specifically, with the assistance of queries, MNER-QG can provide
prior knowledge of entity types and visual regions, and further enhance
representations of both texts and images. To conduct the query grounding task,
we provide manual annotations and weak supervisions that are obtained via
training a highly flexible visual grounding model with transfer learning. We
conduct extensive experiments on two public MNER datasets, Twitter2015 and
Twitter2017. Experimental results show that MNER-QG outperforms the current
state-of-the-art models on the MNER task, and also improves the query grounding
performance.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation [46.9782192992495]
Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions.
We propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models.
arXiv Detail & Related papers (2024-06-11T13:52:29Z) - LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition [28.136662420053568]
Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions.
We propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge.
arXiv Detail & Related papers (2024-02-15T14:54:33Z) - Named Entity Recognition via Machine Reading Comprehension: A Multi-Task
Learning Approach [50.12455129619845]
Named Entity Recognition (NER) aims to extract and classify entity mentions in the text into pre-defined types.
We propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.
arXiv Detail & Related papers (2023-09-20T03:15:05Z) - Multi-task Transformer with Relation-attention and Type-attention for
Named Entity Recognition [35.44123819012004]
Named entity recognition (NER) is an important research problem in natural language processing.
This paper proposes a multi-task Transformer, which incorporates an entity boundary detection task into the named entity recognition task.
arXiv Detail & Related papers (2023-03-20T05:11:22Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Good Visual Guidance Makes A Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction [88.6585431949086]
We propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction.
We regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision.
Experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-05-07T02:10:55Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.