Related papers: eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges

eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges

URL: http://arxiv.org/abs/2107.05856v1
Date: Tue, 13 Jul 2021 05:28:34 GMT
Title: eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges
Authors: Jiangbo Yuan, An-Ti Chiang, Wen Tang, Antonio Haro
Abstract summary: eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting. We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development. We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
Score: 8.204924070199866
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale product recognition is one of the major applications of computer vision and machine learning in the e-commerce domain. Since the number of products is typically much larger than the number of categories of products, image-based product recognition is often cast as a visual search rather than a classification problem. It is also one of the instances of super fine-grained recognition, where there are many products with slight or subtle visual differences. It has always been a challenge to create a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting. This motivated creation of eProduct, a dataset consisting of 2.5 million product images towards accelerating development in the areas of self-supervised learning, weakly-supervised learning, and multimodal learning, for fine-grained recognition. We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development, and the evaluation set includes 10,000 query and 1.1 million index images for visual search evaluation. We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.

Related papers

Ranking-aware adapter for text-driven image ordering with CLIP [76.80965830448781]
We propose an effective yet efficient approach that reframes the CLIP model into a learning-to-rank task. Our approach incorporates learnable prompts to adapt to new instructions for ranking purposes. Our ranking-aware adapter consistently outperforms fine-tuned CLIPs on various tasks.
arXiv Detail & Related papers (2024-12-09T18:51:05Z)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods. We introduce the MIMEX dataset, comprising 28 distinct product categories. We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z)
Shopping Queries Image Dataset (SQID): An Image-Enriched ESCI Dataset for Exploring Multimodal Learning in Product Search [0.6106642353538779]
Shopping Queries Image dataset (SQID) is an extension of the Amazon Shopping Queries dataset enriched with image information associated with 190,000 products. By integrating visual information, SQID facilitates research around multimodal learning techniques for improving product search and ranking. We provide experimental results leveraging SQID and pretrained models, showing the value of using multimodal data for search and ranking.
arXiv Detail & Related papers (2024-05-24T03:50:31Z)
Exploiting Category Names for Few-Shot Classification with Vision-Language Models [78.51975804319149]
Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head.
arXiv Detail & Related papers (2022-11-29T21:08:46Z)
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce [9.46186546774799]
We propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges.
arXiv Detail & Related papers (2022-07-01T05:16:47Z)
Automatic Generation of Product-Image Sequence in E-commerce [46.06263129000091]
Multi-modality Unified Imagesequence (MUIsC) is able to simultaneously detect all categories through learning rule violations. By Dec 2021, our AGPIS framework has generated high-standard images for about 1.5 million products and achieves 13.6% in reject rate.
arXiv Detail & Related papers (2022-06-26T23:38:42Z)
An Empirical Investigation of Representation Learning for Imitation [76.48784376425911]
Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data. We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation.
arXiv Detail & Related papers (2022-05-16T11:23:42Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
Multi-label classification of promotions in digital leaflets using textual and visual information [1.5469452301122175]
We present an end-to-end approach that classifies promotions within digital leaflets into their corresponding product categories. Our approach can be divided into three key components: 1) region detection, 2) text recognition and 3) text classification. We train and evaluate our models using a private dataset composed of images from digital leaflets obtained by Nielsen.
arXiv Detail & Related papers (2020-10-07T11:05:12Z)
Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision. There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.