A multimodal deep learning approach for named entity recognition from
social media
- URL: http://arxiv.org/abs/2001.06888v3
- Date: Sun, 12 Jul 2020 12:29:04 GMT
- Title: A multimodal deep learning approach for named entity recognition from
social media
- Authors: Meysam Asgari-Chenaghlu, M.Reza Feizi-Derakhshi, Leili Farzinvash, M.
A. Balafar, Cina Motamed
- Abstract summary: We propose two novel deep learning approaches utilizing multimodal deep learning and Transformers.
Both of our approaches use image features from short social media posts to provide better results on the NER task.
The experimental results, namely, the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions.
- Score: 1.9511777443446214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named Entity Recognition (NER) from social media posts is a challenging task.
User generated content that forms the nature of social media, is noisy and
contains grammatical and linguistic errors. This noisy content makes it much
harder for tasks such as named entity recognition. We propose two novel deep
learning approaches utilizing multimodal deep learning and Transformers. Both
of our approaches use image features from short social media posts to provide
better results on the NER task. On the first approach, we extract image
features using InceptionV3 and use fusion to combine textual and image
features. This presents more reliable name entity recognition when the images
related to the entities are provided by the user. On the second approach, we
use image features combined with text and feed it into a BERT like Transformer.
The experimental results, namely, the precision, recall and F1 score metrics
show the superiority of our work compared to other state-of-the-art NER
solutions.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs.
We propose a more realistic setting in which only noisy text and its NER labels are available.
We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z) - Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - A Generative Approach for Wikipedia-Scale Visual Entity Recognition [56.55633052479446]
We address the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
We introduce a novel Generative Entity Recognition framework, which learns to auto-regressively decode a semantic and discriminative code'' identifying the target entity.
arXiv Detail & Related papers (2024-03-04T13:47:30Z) - Multi-Granularity Cross-Modality Representation Learning for Named
Entity Recognition on Social Media [11.235498285650142]
Named Entity Recognition (NER) on social media refers to discovering and classifying entities from unstructured free-form content.
This work introduces the multi-granularity cross-modality representation learning.
Experiments show that our proposed approach can achieve the SOTA or approximate SOTA performance on two benchmark datasets of tweets.
arXiv Detail & Related papers (2022-10-19T15:14:55Z) - Flat Multi-modal Interaction Transformer for Named Entity Recognition [1.7605709999848573]
Multi-modal named entity recognition (MNER) aims at identifying entity spans and recognizing their categories in social media posts with the aid of images.
We propose a Flat Multi-modal Interaction Transformer (FMIT) for MNER.
We transform the fine-grained semantic representation of the vision and text into a unified lattice structure and design a novel relative position encoding to match different modalities in Transformer.
arXiv Detail & Related papers (2022-08-23T15:25:44Z) - Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities.
We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z) - RpBERT: A Text-image Relation Propagation-based BERT Model for
Multimodal NER [4.510210055307459]
multimodal named entity recognition (MNER) has utilized images to improve the accuracy of NER in tweets.
We introduce a method of text-image relation propagation into the multimodal BERT model.
We propose a multitask algorithm to train on the MNER datasets.
arXiv Detail & Related papers (2021-02-05T02:45:30Z) - Can images help recognize entities? A study of the role of images for
Multimodal NER [20.574849371747685]
Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context.
While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's ability to leverage multimodal interactions remains poorly understood.
arXiv Detail & Related papers (2020-10-23T23:41:51Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.