eProduct: A Million-Scale Visual Search Benchmark to Address Product
Recognition Challenges
- URL: http://arxiv.org/abs/2107.05856v1
- Date: Tue, 13 Jul 2021 05:28:34 GMT
- Title: eProduct: A Million-Scale Visual Search Benchmark to Address Product
Recognition Challenges
- Authors: Jiangbo Yuan, An-Ti Chiang, Wen Tang, Antonio Haro
- Abstract summary: eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting.
We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development.
We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
- Score: 8.204924070199866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale product recognition is one of the major applications of computer
vision and machine learning in the e-commerce domain. Since the number of
products is typically much larger than the number of categories of products,
image-based product recognition is often cast as a visual search rather than a
classification problem. It is also one of the instances of super fine-grained
recognition, where there are many products with slight or subtle visual
differences. It has always been a challenge to create a benchmark dataset for
training and evaluation on various visual search solutions in a real-world
setting. This motivated creation of eProduct, a dataset consisting of 2.5
million product images towards accelerating development in the areas of
self-supervised learning, weakly-supervised learning, and multimodal learning,
for fine-grained recognition. We present eProduct as a training set and an
evaluation set, where the training set contains 1.3M+ listing images with
titles and hierarchical category labels, for model development, and the
evaluation set includes 10,000 query and 1.1 million index images for visual
search evaluation. We will present eProduct's construction steps, provide
analysis about its diversity and cover the performance of baseline models
trained on it.
Related papers
- Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - Transformer-empowered Multi-modal Item Embedding for Enhanced Image
Search in E-Commerce [20.921870288665627]
Multi-modal Item Embedding Model (MIEM) is capable of utilizing both textual information and multiple images about a product to construct meaningful product features.
MIEM has become an integral part of the Shopee image search platform.
arXiv Detail & Related papers (2023-11-29T08:09:50Z) - Exploiting Category Names for Few-Shot Classification with
Vision-Language Models [78.51975804319149]
Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.
This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head.
arXiv Detail & Related papers (2022-11-29T21:08:46Z) - e-CLIP: Large-Scale Vision-Language Representation Learning in
E-commerce [9.46186546774799]
We propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images.
We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges.
arXiv Detail & Related papers (2022-07-01T05:16:47Z) - Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations.
By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z) - An Empirical Investigation of Representation Learning for Imitation [76.48784376425911]
Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data.
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation.
arXiv Detail & Related papers (2022-05-16T11:23:42Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Multi-label classification of promotions in digital leaflets using
textual and visual information [1.5469452301122175]
We present an end-to-end approach that classifies promotions within digital leaflets into their corresponding product categories.
Our approach can be divided into three key components: 1) region detection, 2) text recognition and 3) text classification.
We train and evaluate our models using a private dataset composed of images from digital leaflets obtained by Nielsen.
arXiv Detail & Related papers (2020-10-07T11:05:12Z) - Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision.
There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.