An Improved Deep Learning Approach For Product Recognition on Racks in
Retail Stores
- URL: http://arxiv.org/abs/2202.13081v1
- Date: Sat, 26 Feb 2022 06:51:36 GMT
- Title: An Improved Deep Learning Approach For Product Recognition on Racks in
Retail Stores
- Authors: Ankit Sinha, Soham Banerjee and Pratik Chattopadhyay
- Abstract summary: Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition.
We develop a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer and a ResNet-18-based image encoder.
Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model.
- Score: 2.470815298095903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated product recognition in retail stores is an important real-world
application in the domain of Computer Vision and Pattern Recognition. In this
paper, we consider the problem of automatically identifying the classes of the
products placed on racks in retail stores from an image of the rack and
information about the query/product images. We improve upon the existing
approaches in terms of effectiveness and memory requirement by developing a
two-stage object detection and recognition pipeline comprising of a
Faster-RCNN-based object localizer that detects the object regions in the rack
image and a ResNet-18-based image encoder that classifies the detected regions
into the appropriate classes. Each of the models is fine-tuned using
appropriate data sets for better prediction and data augmentation is performed
on each query image to prepare an extensive gallery set for fine-tuning the
ResNet-18-based product recognition model. This encoder is trained using a
triplet loss function following the strategy of online-hard-negative-mining for
improved prediction. The proposed models are lightweight and can be connected
in an end-to-end manner during deployment for automatically identifying each
product object placed in a rack image. Extensive experiments using Grozi-32k
and GP-180 data sets verify the effectiveness of the proposed model.
Related papers
- Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - Autoencoders with Intrinsic Dimension Constraints for Learning Low
Dimensional Image Representations [27.40298734517967]
We propose a novel deep representation learning approach with autoencoder, which incorporates regularization of the global and local ID constraints into the reconstruction of data representations.
This approach not only preserves the global manifold structure of the whole dataset, but also maintains the local manifold structure of the feature maps of each point.
arXiv Detail & Related papers (2023-04-16T03:43:08Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Uncertainty Aware Active Learning for Reconfiguration of Pre-trained
Deep Object-Detection Networks for New Target Domains [0.0]
Object detection is one of the most important and fundamental aspects of computer vision tasks.
To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format.
Annotating every frame from a video is costly and inefficient since many frames contain very similar information for the model to learn from.
In this paper, we proposed a novel active learning algorithm for object detection models to tackle this problem.
arXiv Detail & Related papers (2023-03-22T17:14:10Z) - Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains.
We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights.
The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z) - Bag of Tricks for Retail Product Image Classification [0.0]
We present various tricks to increase accuracy of Deep Learning models on different types of retail product image classification datasets.
New neural network layer called Local-Concepts-Accumulation (LCA) layer gives consistent gains across multiple datasets.
Two other tricks we find to increase accuracy on retail product identification are using an instagram-pretrained Convnet and using Maximum Entropy as an auxiliary loss for classification.
arXiv Detail & Related papers (2020-01-12T20:20:07Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.