Multi-label classification of promotions in digital leaflets using
textual and visual information
- URL: http://arxiv.org/abs/2010.03331v1
- Date: Wed, 7 Oct 2020 11:05:12 GMT
- Title: Multi-label classification of promotions in digital leaflets using
textual and visual information
- Authors: Roberto Arroyo, David Jim\'enez-Cabello and Javier
Mart\'inez-Cebri\'an
- Abstract summary: We present an end-to-end approach that classifies promotions within digital leaflets into their corresponding product categories.
Our approach can be divided into three key components: 1) region detection, 2) text recognition and 3) text classification.
We train and evaluate our models using a private dataset composed of images from digital leaflets obtained by Nielsen.
- Score: 1.5469452301122175
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Product descriptions in e-commerce platforms contain detailed and valuable
information about retailers assortment. In particular, coding promotions within
digital leaflets are of great interest in e-commerce as they capture the
attention of consumers by showing regular promotions for different products.
However, this information is embedded into images, making it difficult to
extract and process for downstream tasks. In this paper, we present an
end-to-end approach that classifies promotions within digital leaflets into
their corresponding product categories using both visual and textual
information. Our approach can be divided into three key components: 1) region
detection, 2) text recognition and 3) text classification. In many cases, a
single promotion refers to multiple product categories, so we introduce a
multi-label objective in the classification head. We demonstrate the
effectiveness of our approach for two separated tasks: 1) image-based detection
of the descriptions for each individual promotion and 2) multi-label
classification of the product categories using the text from the product
descriptions. We train and evaluate our models using a private dataset composed
of images from digital leaflets obtained by Nielsen. Results show that we
consistently outperform the proposed baseline by a large margin in all the
experiments.
Related papers
- Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Mutual Query Network for Multi-Modal Product Image Segmentation [13.192334066413837]
We propose a mutual query network to segment products based on both visual and linguistic modalities.
To promote the research in this field, we also construct a Multi-Modal Product dataset (MMPS)
The proposed method significantly outperforms the state-of-the-art methods on MMPS.
arXiv Detail & Related papers (2023-06-26T03:18:38Z) - Product Information Extraction using ChatGPT [69.12244027050454]
This paper explores the potential of ChatGPT for extracting attribute/value pairs from product descriptions.
Our results show that ChatGPT achieves a performance similar to a pre-trained language model but requires much smaller amounts of training data and computation for fine-tuning.
arXiv Detail & Related papers (2023-06-23T09:30:01Z) - Unified Vision-Language Representation Modeling for E-Commerce
Same-Style Products Retrieval [12.588713044749177]
Same-style products retrieval plays an important role in e-commerce platforms.
We propose a unified vision-language modeling method for e-commerce same-style products retrieval.
It is capable of cross-modal product-to-product retrieval, as well as style transfer and user-interactive search.
arXiv Detail & Related papers (2023-02-10T07:24:23Z) - Visually Similar Products Retrieval for Shopsy [0.0]
We design a visual search system for reseller commerce using a multi-task learning approach.
Our model consists of three different tasks: attribute classification, triplet ranking and variational autoencoder (VAE)
arXiv Detail & Related papers (2022-10-10T10:59:18Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - PAM: Understanding Product Images in Cross Product Category Attribute
Extraction [40.332066960433245]
This work proposes a more inclusive framework that fully utilizes different modalities for attribute extraction.
Inspired by recent works in visual question answering, we use a transformer based sequence to sequence model to fuse representations of product text, Optical Character Recognition (OCR) tokens and visual objects detected in the product image.
The framework is further extended with the capability to extract attribute value across multiple product categories with a single model.
arXiv Detail & Related papers (2021-06-08T18:30:17Z) - Comprehensive Information Integration Modeling Framework for Video
Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.
To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization.
We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z) - Automatic Validation of Textual Attribute Values in E-commerce Catalog
by Learning with Limited Labeled Data [61.789797281676606]
We propose a novel meta-learning latent variable approach, called MetaBridge.
It can learn transferable knowledge from a subset of categories with limited labeled data.
It can capture the uncertainty of never-seen categories with unlabeled data.
arXiv Detail & Related papers (2020-06-15T21:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.