RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a
single GPU for Zero-shot retail product image classification
- URL: http://arxiv.org/abs/2312.10282v2
- Date: Sun, 14 Jan 2024 22:43:58 GMT
- Title: RetailKLIP : Finetuning OpenCLIP backbone using metric learning on a
single GPU for Zero-shot retail product image classification
- Authors: Muktabh Mayank Srivastava
- Abstract summary: We propose finetuning the vision encoder of a CLIP model in a way that its embeddings can be easily used for nearest neighbor based classification.
A nearest neighbor based classification needs no incremental training for new products, thus saving resources and wait time.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Retail product or packaged grocery goods images need to classified in various
computer vision applications like self checkout stores, supply chain automation
and retail execution evaluation. Previous works explore ways to finetune deep
models for this purpose. But because of the fact that finetuning a large model
or even linear layer for a pretrained backbone requires to run at least a few
epochs of gradient descent for every new retail product added in classification
range, frequent retrainings are needed in a real world scenario. In this work,
we propose finetuning the vision encoder of a CLIP model in a way that its
embeddings can be easily used for nearest neighbor based classification, while
also getting accuracy close to or exceeding full finetuning. A nearest neighbor
based classifier needs no incremental training for new products, thus saving
resources and wait time.
Related papers
- Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning [86.15009879251386]
We propose a novel architecture and method of explainable classification with Concept Bottleneck Models (CBM)
CBMs require an additional set of concepts to leverage.
We show a significant increase in accuracy using sparse hidden layers in CLIP-based bottleneck models.
arXiv Detail & Related papers (2024-04-04T09:43:43Z) - Image-free Classifier Injection for Zero-Shot Classification [72.66409483088995]
Zero-shot learning models achieve remarkable results on image classification for samples from classes that were not seen during training.
We aim to equip pre-trained models with zero-shot classification capabilities without the use of image data.
We achieve this with our proposed Image-free Injection with Semantics (ICIS)
arXiv Detail & Related papers (2023-08-21T09:56:48Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Using Contrastive Learning and Pseudolabels to learn representations for
Retail Product Image Classification [0.0]
We use contrastive learning and pseudolabel based noisy student training to learn representations that get accuracy in order of finetuning the entire Convnet backbone for retail product image classification.
arXiv Detail & Related papers (2021-10-07T17:29:05Z) - Half-Real Half-Fake Distillation for Class-Incremental Semantic
Segmentation [84.1985497426083]
convolutional neural networks are ill-equipped for incremental learning.
New classes are available but the initial training data is not retained.
We try to address this issue by "inverting" the trained segmentation network to synthesize input images starting from random noise.
arXiv Detail & Related papers (2021-04-02T03:47:16Z) - The Lottery Tickets Hypothesis for Supervised and Self-supervised
Pre-training in Computer Vision Models [115.49214555402567]
Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation.
Recent studies suggest that pre-training benefits from gigantic model capacity.
In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH)
arXiv Detail & Related papers (2020-12-12T21:53:55Z) - Move-to-Data: A new Continual Learning approach with Deep CNNs,
Application for image-class recognition [0.0]
It is necessary to pre-train the model at a "training recording phase" and then adjust it to the new coming data.
We propose a fast continual learning layer at the end of the neuronal network.
arXiv Detail & Related papers (2020-06-12T13:04:58Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Novelty-Prepared Few-Shot Classification [24.42397780877619]
We propose to use a novelty-prepared loss function, called self-compacting softmax loss (SSL), for few-shot classification.
In experiments on CUB-200-2011 and mini-ImageNet datasets, we show that SSL leads to significant improvement of the state-of-the-art performance.
arXiv Detail & Related papers (2020-03-01T14:44:29Z) - Bag of Tricks for Retail Product Image Classification [0.0]
We present various tricks to increase accuracy of Deep Learning models on different types of retail product image classification datasets.
New neural network layer called Local-Concepts-Accumulation (LCA) layer gives consistent gains across multiple datasets.
Two other tricks we find to increase accuracy on retail product identification are using an instagram-pretrained Convnet and using Maximum Entropy as an auxiliary loss for classification.
arXiv Detail & Related papers (2020-01-12T20:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.