Related papers: Machine Learning approaches to do size based reasoning on Retail Shelf objects to classify product variants

Machine Learning approaches to do size based reasoning on Retail Shelf objects to classify product variants

URL: http://arxiv.org/abs/2110.03783v1
Date: Thu, 7 Oct 2021 20:29:07 GMT
Title: Machine Learning approaches to do size based reasoning on Retail Shelf objects to classify product variants
Authors: Muktabh Mayank Srivastava, Pratyush Kumar
Abstract summary: Deep learning based computer vision methods can be used to detect products on retail shelves and then classify them. There are different sized variants of products which look exactly the same visually and the method to differentiate them is to look at their relative sizes with other products on shelves. This makes the process of deciphering the sized based variants from each other using computer vision algorithms alone impractical.
Score: 3.3767251810292955
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: There has been a surge in the number of Machine Learning methods to analyze products kept on retail shelves images. Deep learning based computer vision methods can be used to detect products on retail shelves and then classify them. However, there are different sized variants of products which look exactly the same visually and the method to differentiate them is to look at their relative sizes with other products on shelves. This makes the process of deciphering the sized based variants from each other using computer vision algorithms alone impractical. In this work, we propose methods to ascertain the size variant of the product as a downstream task to an object detector which extracts products from shelf and a classifier which determines product brand. Product variant determination is the task which assigns a product variant to products of a brand based on the size of bounding boxes and brands predicted by classifier. While gradient boosting based methods work well for products whose facings are clear and distinct, a noise accommodating Neural Network method is proposed for cases where the products are stacked irregularly.

Related papers

Understanding Visual Saliency of Outlier Items in Product Search [62.12411635661447]
In two-sided marketplaces, items compete for user attention, which translates to revenue for suppliers. Recent work suggests that inter-item dependencies, such as outlier items in a ranking, also affect item exposure. We investigate how top-down factors influence users' perception of item outlierness in a realistic online shopping scenario.
arXiv Detail & Related papers (2025-03-30T21:22:23Z)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods. We introduce the MIMEX dataset, comprising 28 distinct product categories. We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z)
Improving Online Lane Graph Extraction by Object-Lane Clustering [106.71926896061686]
We propose an architecture and loss formulation to improve the accuracy of local lane graph estimates. The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers. We show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods.
arXiv Detail & Related papers (2023-07-20T15:21:28Z)
VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout [0.7250756081498245]
We propose to segment and classify individual frames from a video sequence. The segmentation method consists of a unified single product item- and hand-segmentation followed by entropy masking. Our best system achieves 3rd place in the AI City Challenge 2022 Track 4 with an F1 score of 0.4545.
arXiv Detail & Related papers (2022-04-23T08:54:28Z)
Text Classification for Predicting Multi-level Product Categories [0.0]
In an online shopping platform, a detailed classification of the products facilitates user navigation. In this study, we focus on product title classification of the grocery products.
arXiv Detail & Related papers (2021-09-02T17:00:05Z)
eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges [8.204924070199866]
eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting. We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development. We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
arXiv Detail & Related papers (2021-07-13T05:28:34Z)
Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning. Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples. Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)
Interpretable Methods for Identifying Product Variants [0.2589904091148018]
We introduce a novel approach to identifying product variants. It combines both constrained clustering and tailored NLP techniques. We design the algorithm to meet certain business criteria, including meeting high accuracy requirements.
arXiv Detail & Related papers (2021-04-12T14:37:16Z)
Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization [87.96102461221415]
We develop an algorithm that provides per-class explainability. In an extensive battery of experiments, we demonstrate the ability of our methods to class-specific visualization.
arXiv Detail & Related papers (2020-12-03T18:48:39Z)
A Self-Training Approach for Point-Supervised Object Detection and Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations. During training, we utilize the available point annotations to supervise the estimation of the center points of objects. Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.