Related papers: Multimodal Entity Linking for Tweets

Multimodal Entity Linking for Tweets

URL: http://arxiv.org/abs/2104.03236v1
Date: Wed, 7 Apr 2021 16:40:23 GMT
Title: Multimodal Entity Linking for Tweets
Authors: Omar Adjali and Romaric Besan\c{c}on and Olivier Ferret and Herve Le Borgne and Brigitte Grau
Abstract summary: multimodal entity linking (MEL) is an emerging research field in which textual and visual information is used to map an ambiguous mention to an entity in a knowledge base (KB) We propose a method for building a fully annotated Twitter dataset for MEL, where entities are defined in a Twitter KB. Then, we propose a model for jointly learning a representation of both mentions and entities from their textual and visual contexts.
Score: 6.439761523935613
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: In many information extraction applications, entity linking (EL) has emerged as a crucial task that allows leveraging information about named entities from a knowledge base. In this paper, we address the task of multimodal entity linking (MEL), an emerging research field in which textual and visual information is used to map an ambiguous mention to an entity in a knowledge base (KB). First, we propose a method for building a fully annotated Twitter dataset for MEL, where entities are defined in a Twitter KB. Then, we propose a model for jointly learning a representation of both mentions and entities from their textual and visual contexts. We demonstrate the effectiveness of the proposed model by evaluating it on the proposed dataset and highlight the importance of leveraging visual information when it is available.

Related papers

VP-MEL: Visual Prompts Guided Multimodal Entity Linking [16.463229055333407]
Multimodal entity linking (MEL) is a task aimed at linking mentions within multimodal contexts to their corresponding entities in a knowledge base (KB) Existing MEL methods often rely on mention words as retrieval cues, which limits their ability to effectively utilize information from both images and text. We propose a framework named IIER, which enhances visual feature extraction using visual prompts and leverages the pretrained Detective-VLM model to capture latent information.
arXiv Detail & Related papers (2024-12-09T18:06:39Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images. In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS) Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z)
DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model [16.20833396645551]
We propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dynamically Integrate Multimodal information with knowledge base (DIM), employing the capability of the Large Language Model (LLM) for visual understanding.
arXiv Detail & Related papers (2024-06-27T15:18:23Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes [46.67148487519558]
We propose attribute-aware multimodal entity linking.<n>The input consists of a mention described with a text paragraph and images.<n>The goal is to predict the corresponding target entity from a multimodal knowledge base.
arXiv Detail & Related papers (2023-05-24T05:01:48Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
Visual Named Entity Linking: A New Dataset and A Baseline [61.38231023490981]
We consider a purely Visual-based Named Entity Linking (VNEL) task, where the input only consists of an image. We propose three different sub-tasks, i.e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL) We present a high-quality human-annotated visual person linking dataset, named WIKIPerson.
arXiv Detail & Related papers (2022-11-09T13:27:50Z)
Injecting Knowledge Base Information into End-to-End Joint Entity and Relation Extraction and Coreference Resolution [13.973471173349072]
We study how to inject information from a knowledge base (KB) in such IE model, based on unsupervised entity linking. The used KB entity representations are learned from either (i) hyperlinked text documents (Wikipedia), or (ii) a knowledge graph (Wikidata)
arXiv Detail & Related papers (2021-07-05T21:49:02Z)
InSRL: A Multi-view Learning Framework Fusing Multiple Information Sources for Distantly-supervised Relation Extraction [19.176183245280267]
We introduce two widely-existing sources in knowledge bases, namely entity descriptions and multi-grained entity types. An end-to-end multi-view learning framework is proposed for relation extraction via Intact Space Representation Learning (InSRL)
arXiv Detail & Related papers (2020-12-17T02:49:46Z)
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs. Building upon entity-level masked language models, our first contribution is an entity masking scheme. In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.