AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
- URL: http://arxiv.org/abs/2305.14725v1
- Date: Wed, 24 May 2023 05:01:48 GMT
- Title: AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
- Authors: Barry Menglong Yao, Yu Chen, Qifan Wang, Sijia Wang, Minqian Liu,
Zhiyang Xu, Licheng Yu, Lifu Huang
- Abstract summary: We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image.
The goal is to predict the corresponding target entity from a multimodal knowledge base.
To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products.
- Score: 22.158388220889865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose attribute-aware multimodal entity linking, where the input is a
mention described with a text and image, and the goal is to predict the
corresponding target entity from a multimodal knowledge base (KB) where each
entity is also described with a text description, a visual image and a set of
attributes and values. To support this research, we construct AMELI, a
large-scale dataset consisting of 18,472 reviews and 35,598 products. To
establish baseline performance on AMELI, we experiment with the current
state-of-the-art multimodal entity linking approaches and our enhanced
attribute-aware model and demonstrate the importance of incorporating the
attribute information into the entity linking process. To be best of our
knowledge, we are the first to build benchmark dataset and solutions for the
attribute-aware multimodal entity linking task. Datasets and codes will be made
publicly available.
Related papers
- DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model [16.20833396645551]
We propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets.
We also propose a method: Dynamically Integrate Multimodal information with knowledge base (DIM), employing the capability of the Large Language Model (LLM) for visual understanding.
arXiv Detail & Related papers (2024-06-27T15:18:23Z) - MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning [33.12021227971062]
Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen neglecting and recognize unseen attribute-object compositions.
We introduce the Multi-Attribute Composition dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations.
Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task.
arXiv Detail & Related papers (2024-06-18T16:24:48Z) - EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM [52.016009472409166]
EIVEN is a data- and parameter-efficient generative framework for implicit attribute value extraction.
We introduce a novel Learning-by-Comparison technique to reduce model confusion.
Our experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values.
arXiv Detail & Related papers (2024-04-13T03:15:56Z) - MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product
Summarization [93.5217515566437]
Multi-modal Product Summarization (MPS) aims to increase customers' desire to purchase by highlighting product characteristics.
Existing MPS methods can produce promising results, but they still lack end-to-end product summarization.
We propose an end-to-end multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce.
arXiv Detail & Related papers (2023-08-22T11:00:09Z) - MM-GEF: Multi-modal representation meet collaborative filtering [51.04679619309803]
We propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion.
MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals.
arXiv Detail & Related papers (2023-08-14T15:47:36Z) - Attribute-Consistent Knowledge Graph Representation Learning for
Multi-Modal Entity Alignment [14.658282035561792]
We propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA)
Our approach achieves excellent performance compared to its competitors.
arXiv Detail & Related papers (2023-04-04T06:39:36Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - AdaTag: Multi-Attribute Value Extraction from Product Profiles with
Adaptive Decoding [55.89773725577615]
We present AdaTag, which uses adaptive decoding to handle attribute extraction.
Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
arXiv Detail & Related papers (2021-06-04T07:54:11Z) - Multimodal Entity Linking for Tweets [6.439761523935613]
multimodal entity linking (MEL) is an emerging research field in which textual and visual information is used to map an ambiguous mention to an entity in a knowledge base (KB)
We propose a method for building a fully annotated Twitter dataset for MEL, where entities are defined in a Twitter KB.
Then, we propose a model for jointly learning a representation of both mentions and entities from their textual and visual contexts.
arXiv Detail & Related papers (2021-04-07T16:40:23Z) - Exploring and Evaluating Attributes, Values, and Structures for Entity
Alignment [100.19568734815732]
Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs.
attribute triples can also provide crucial alignment signal but have not been well explored yet.
We propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently.
arXiv Detail & Related papers (2020-10-07T08:03:58Z) - Multimodal Joint Attribute Prediction and Value Extraction for
E-commerce Product [40.46223408546036]
Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval.
While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
We propose a multimodal method to jointly predict product attributes and extract values from textual product descriptions with the help of the product images.
arXiv Detail & Related papers (2020-09-15T15:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.