A multimodal deep learning approach for named entity recognition from
social media
- URL: http://arxiv.org/abs/2001.06888v3
- Date: Sun, 12 Jul 2020 12:29:04 GMT
- Title: A multimodal deep learning approach for named entity recognition from
social media
- Authors: Meysam Asgari-Chenaghlu, M.Reza Feizi-Derakhshi, Leili Farzinvash, M.
A. Balafar, Cina Motamed
- Abstract summary: We propose two novel deep learning approaches utilizing multimodal deep learning and Transformers.
Both of our approaches use image features from short social media posts to provide better results on the NER task.
The experimental results, namely, the precision, recall and F1 score metrics show the superiority of our work compared to other state-of-the-art NER solutions.
- Score: 1.9511777443446214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named Entity Recognition (NER) from social media posts is a challenging task.
User generated content that forms the nature of social media, is noisy and
contains grammatical and linguistic errors. This noisy content makes it much
harder for tasks such as named entity recognition. We propose two novel deep
learning approaches utilizing multimodal deep learning and Transformers. Both
of our approaches use image features from short social media posts to provide
better results on the NER task. On the first approach, we extract image
features using InceptionV3 and use fusion to combine textual and image
features. This presents more reliable name entity recognition when the images
related to the entities are provided by the user. On the second approach, we
use image features combined with text and feed it into a BERT like Transformer.
The experimental results, namely, the precision, recall and F1 score metrics
show the superiority of our work compared to other state-of-the-art NER
solutions.
Related papers
- Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation [90.71613903956451]
Text-to-image retrieval is a fundamental task in multimedia processing.
We propose an autoregressive voken generation method, named AVG.
We show that AVG achieves superior results in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-24T13:39:51Z) - Transformer based Multitask Learning for Image Captioning and Object
Detection [13.340784876489927]
This work introduces a novel multitask learning framework that combines image captioning and object detection into a joint model.
We propose TICOD, Transformer-based Image Captioning and Object detection model for jointly training both tasks.
Our model outperforms the baselines from image captioning literature by achieving a 3.65% improvement in BERTScore.
arXiv Detail & Related papers (2024-03-10T19:31:13Z) - A Generative Approach for Wikipedia-Scale Visual Entity Recognition [56.55633052479446]
We address the task of mapping a given query image to one of the 6 million existing entities in Wikipedia.
We introduce a novel Generative Entity Recognition framework, which learns to auto-regressively decode a semantic and discriminative code'' identifying the target entity.
arXiv Detail & Related papers (2024-03-04T13:47:30Z) - Multi-Granularity Cross-Modality Representation Learning for Named
Entity Recognition on Social Media [11.235498285650142]
Named Entity Recognition (NER) on social media refers to discovering and classifying entities from unstructured free-form content.
This work introduces the multi-granularity cross-modality representation learning.
Experiments show that our proposed approach can achieve the SOTA or approximate SOTA performance on two benchmark datasets of tweets.
arXiv Detail & Related papers (2022-10-19T15:14:55Z) - Flat Multi-modal Interaction Transformer for Named Entity Recognition [1.7605709999848573]
Multi-modal named entity recognition (MNER) aims at identifying entity spans and recognizing their categories in social media posts with the aid of images.
We propose a Flat Multi-modal Interaction Transformer (FMIT) for MNER.
We transform the fine-grained semantic representation of the vision and text into a unified lattice structure and design a novel relative position encoding to match different modalities in Transformer.
arXiv Detail & Related papers (2022-08-23T15:25:44Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z) - Semantically Self-Aligned Network for Text-to-Image Part-aware Person
Re-identification [78.45528514468836]
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions.
We propose a Semantically Self-Aligned Network (SSAN) to handle the above problems.
To expedite future research in text-to-image ReID, we build a new database named ICFG-PEDES.
arXiv Detail & Related papers (2021-07-27T08:26:47Z) - Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph [96.95815946327079]
It is difficult to learn the association between named entities and visual cues due to the long-tail distribution of named entities.
We propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities.
arXiv Detail & Related papers (2021-07-26T05:50:41Z) - RpBERT: A Text-image Relation Propagation-based BERT Model for
Multimodal NER [4.510210055307459]
multimodal named entity recognition (MNER) has utilized images to improve the accuracy of NER in tweets.
We introduce a method of text-image relation propagation into the multimodal BERT model.
We propose a multitask algorithm to train on the MNER datasets.
arXiv Detail & Related papers (2021-02-05T02:45:30Z) - Can images help recognize entities? A study of the role of images for
Multimodal NER [20.574849371747685]
Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context.
While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model's ability to leverage multimodal interactions remains poorly understood.
arXiv Detail & Related papers (2020-10-23T23:41:51Z) - Text as Neural Operator: Image Manipulation by Text Instruction [68.53181621741632]
In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects.
The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image.
We show that the proposed model performs favorably against recent strong baselines on three public datasets.
arXiv Detail & Related papers (2020-08-11T07:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.