Automated Extraction of Fine-Grained Standardized Product Information
from Unstructured Multilingual Web Data
- URL: http://arxiv.org/abs/2302.12139v1
- Date: Thu, 23 Feb 2023 16:26:11 GMT
- Title: Automated Extraction of Fine-Grained Standardized Product Information
from Unstructured Multilingual Web Data
- Authors: Alexander Flick and Sebastian J\"ager and Ivana Trajanovska and Felix
Biessmann
- Abstract summary: We show how recent advances in machine learning, combined with a recently published multilingual data set, enable robust product attribute extraction.
Our models can reliably predict product attributes across online shops, languages, or both.
- Score: 66.21317300595483
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Extracting structured information from unstructured data is one of the key
challenges in modern information retrieval applications, including e-commerce.
Here, we demonstrate how recent advances in machine learning, combined with a
recently published multilingual data set with standardized fine-grained product
category information, enable robust product attribute extraction in challenging
transfer learning settings. Our models can reliably predict product attributes
across online shops, languages, or both. Furthermore, we show that our models
can be used to match product taxonomies between online retailers.
Related papers
- A Multimodal In-Context Tuning Approach for E-Commerce Product
Description Generation [47.70824723223262]
We propose a new setting for generating product descriptions from images, augmented by marketing keywords.
We present a simple and effective Multimodal In-Context Tuning approach, named ModICT, which introduces a similar product sample as the reference.
Experiments demonstrate that ModICT significantly improves the accuracy (by up to 3.3% on Rouge-L) and diversity (by up to 9.4% on D-5) of generated results compared to conventional methods.
arXiv Detail & Related papers (2024-02-21T07:38:29Z) - Enhanced E-Commerce Attribute Extraction: Innovating with Decorative
Relation Correction and LLAMA 2.0-Based Annotation [4.81846973621209]
We propose a pioneering framework that integrates BERT for classification, a Conditional Random Fields (CRFs) layer for attribute value extraction, and Large Language Models (LLMs) for data annotation.
Our approach capitalizes on the robust representation learning of BERT, synergized with the sequence decoding prowess of CRFs, to adeptly identify and extract attribute values.
Our methodology is rigorously validated on various datasets, including Walmart, BestBuy's e-commerce NER dataset, and the CoNLL dataset.
arXiv Detail & Related papers (2023-12-09T08:26:30Z) - From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web [118.67589717634281]
Continual learning often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.
We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation.
Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification.
arXiv Detail & Related papers (2023-11-19T10:43:43Z) - Product Information Extraction using ChatGPT [69.12244027050454]
This paper explores the potential of ChatGPT for extracting attribute/value pairs from product descriptions.
Our results show that ChatGPT achieves a performance similar to a pre-trained language model but requires much smaller amounts of training data and computation for fine-tuning.
arXiv Detail & Related papers (2023-06-23T09:30:01Z) - Exploiting Knowledge Graphs for Facilitating Product/Service Discovery [1.2691047660244332]
This work presents a cost-effective solution for e-commerce on the Data Web by employing an unsupervised approach for data classification.
The proposed architecture describes available products in web language OWL and stores them in a triple store.
User input specifications for certain products are matched against the available product categories to generate a knowledge graph.
arXiv Detail & Related papers (2020-10-11T10:22:10Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z) - Cross-Lingual Low-Resource Set-to-Description Retrieval for Global
E-Commerce [83.72476966339103]
Cross-lingual information retrieval is a new task in cross-border e-commerce.
We propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping.
Experimental results indicate that our proposed CLMN yields impressive results on the challenging task.
arXiv Detail & Related papers (2020-05-17T08:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.