MAVE: A Product Dataset for Multi-source Attribute Value Extraction
- URL: http://arxiv.org/abs/2112.08663v1
- Date: Thu, 16 Dec 2021 06:48:31 GMT
- Title: MAVE: A Product Dataset for Multi-source Attribute Value Extraction
- Authors: Li Yang, Qifan Wang, Zac Yu, Anand Kulkarni, Sumit Sanghai, Bin Shu,
Jon Elsas, Bhargav Kanagal
- Abstract summary: We introduce MAVE, a new dataset to better facilitate research on product attribute value extraction.
MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories.
We propose a novel approach that effectively extracts the attribute value from the multi-source product information.
- Score: 10.429320377835241
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Attribute value extraction refers to the task of identifying values of an
attribute of interest from product information. Product attribute values are
essential in many e-commerce scenarios, such as customer service robots,
product ranking, retrieval and recommendations. While in the real world, the
attribute values of a product are usually incomplete and vary over time, which
greatly hinders the practical applications. In this paper, we introduce MAVE, a
new dataset to better facilitate research on product attribute value
extraction. MAVE is composed of a curated set of 2.2 million products from
Amazon pages, with 3 million attribute-value annotations across 1257 unique
categories. MAVE has four main and unique advantages: First, MAVE is the
largest product attribute value extraction dataset by the number of
attribute-value examples. Second, MAVE includes multi-source representations
from the product, which captures the full product information with high
attribute coverage. Third, MAVE represents a more diverse set of attributes and
values relative to what previous datasets cover. Lastly, MAVE provides a very
challenging zero-shot test set, as we empirically illustrate in the
experiments. We further propose a novel approach that effectively extracts the
attribute value from the multi-source product information. We conduct extensive
experiments with several baselines and show that MAVE is an effective dataset
for attribute value extraction task. It is also a very challenging task on
zero-shot attribute extraction. Data is available at {\it
\url{https://github.com/google-research-datasets/MAVE}}.
Related papers
- EAVE: Efficient Product Attribute Value Extraction via Lightweight Sparse-layer Interaction [94.22610101608332]
We propose an Efficient product Attribute Value Extraction (EAVE) approach via lightweight sparse-layer interaction.
We employ a heavy encoder to separately encode the product context and attribute. The resulting non-interacting heavy representations of the context can be cached and reused for all attributes.
Our method achieves significant efficiency gains with neutral or marginal loss in performance when the context is long and number of attributes is large.
arXiv Detail & Related papers (2024-06-10T23:06:38Z) - ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction [67.86012624533461]
ImplicitAVE is the first, publicly available multimodal dataset for implicit attribute value extraction.
dataset includes 68k training and 1.6k testing data across five domains.
We also explore the application of multimodal large language models (MLLMs) to implicit AVE.
arXiv Detail & Related papers (2024-04-24T01:54:40Z) - EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM [52.016009472409166]
EIVEN is a data- and parameter-efficient generative framework for implicit attribute value extraction.
We introduce a novel Learning-by-Comparison technique to reduce model confusion.
Our experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values.
arXiv Detail & Related papers (2024-04-13T03:15:56Z) - Using LLMs for the Extraction and Normalization of Product Attribute Values [47.098255866050835]
This paper explores the potential of using large language models (LLMs) to extract and normalize attribute values from product titles and descriptions.
We introduce the Web Data Commons - Product Attribute Value Extraction (WDC-PAVE) benchmark dataset for our experiments.
arXiv Detail & Related papers (2024-03-04T15:39:59Z) - AE-smnsMLC: Multi-Label Classification with Semantic Matching and
Negative Label Sampling for Product Attribute Value Extraction [42.79022954630978]
Product attribute value extraction plays an important role for many real-world applications in e-Commerce such as product search and recommendation.
Previous methods treat it as a sequence labeling task that needs more annotation for position of values in the product text.
We propose a classification model with semantic matching and negative label sampling for attribute value extraction.
arXiv Detail & Related papers (2023-10-11T02:22:28Z) - Simple and Effective Knowledge-Driven Query Expansion for QA-Based
Product Attribute Extraction [6.752749933406399]
Key challenge in value extraction from e-commerce sites is how to handle a large number of attributes for diverse products.
We propose a knowledge-driven query expansion based on possible answers (values) of a query (attribute) for QA-based AVE.
arXiv Detail & Related papers (2022-06-28T19:43:57Z) - OA-Mine: Open-World Attribute Mining for E-Commerce Products with Weak
Supervision [93.26737878221073]
We study the attribute mining problem in an open-world setting to extract novel attributes and their values.
We propose a principled framework that first generates attribute value candidates and then groups them into clusters of attributes.
Our model significantly outperforms strong baselines and can generalize to unseen attributes and product types.
arXiv Detail & Related papers (2022-04-29T04:16:04Z) - AdaTag: Multi-Attribute Value Extraction from Product Profiles with
Adaptive Decoding [55.89773725577615]
We present AdaTag, which uses adaptive decoding to handle attribute extraction.
Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
arXiv Detail & Related papers (2021-06-04T07:54:11Z) - Multimodal Joint Attribute Prediction and Value Extraction for
E-commerce Product [40.46223408546036]
Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product recommendations, and product retrieval.
While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
We propose a multimodal method to jointly predict product attributes and extract values from textual product descriptions with the help of the product images.
arXiv Detail & Related papers (2020-09-15T15:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.