RP2K: A Large-Scale Retail Product Dataset for Fine-Grained Image
Classification
- URL: http://arxiv.org/abs/2006.12634v7
- Date: Wed, 1 Sep 2021 16:21:13 GMT
- Title: RP2K: A Large-Scale Retail Product Dataset for Fine-Grained Image
Classification
- Authors: Jingtian Peng, Chang Xiao, Yifan Li
- Abstract summary: RP2K is a new large-scale retail product dataset for fine-grained image classification.
Unlike previous datasets, we collect more than 500,000 images of retail products on shelves belonging to 2000 different products.
- Score: 19.82453283089643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce RP2K, a new large-scale retail product dataset for fine-grained
image classification. Unlike previous datasets focusing on relatively few
products, we collect more than 500,000 images of retail products on shelves
belonging to 2000 different products. Our dataset aims to advance the research
in retail object recognition, which has massive applications such as automatic
shelf auditing and image-based product information retrieval. Our dataset
enjoys following properties: (1) It is by far the largest scale dataset in
terms of product categories. (2) All images are captured manually in physical
retail stores with natural lightings, matching the scenario of real
applications. (3) We provide rich annotations to each object, including the
sizes, shapes and flavors/scents. We believe our dataset could benefit both
computer vision research and retail industry. Our dataset is publicly available
at https://www.pinlandata.com/rp2k_dataset.
Related papers
- Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods.
We introduce the MIMEX dataset, comprising 28 distinct product categories.
We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z) - 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - Retail-786k: a Large-Scale Dataset for Visual Entity Matching [0.0]
This paper introduces the first publicly available large-scale dataset for "visual entity matching"
We provide a total of 786k manually annotated, high resolution product images containing 18k different individual retail products which are grouped into 3k entities.
The proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms.
arXiv Detail & Related papers (2023-09-29T11:58:26Z) - Expanding Small-Scale Datasets with Guided Imagination [92.5276783917845]
dataset expansion is a new task aimed at expanding a ready-to-use small dataset by automatically creating new labeled samples.
GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model.
GIF-SD obtains 13.5% higher model accuracy on natural image datasets than unguided expansion with SD.
arXiv Detail & Related papers (2022-11-25T09:38:22Z) - VizWiz-FewShot: Locating Objects in Images Taken by People With Visual
Impairments [74.72656607288185]
We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took.
It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments.
Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects.
arXiv Detail & Related papers (2022-07-24T20:44:51Z) - Unitail: Detecting, Reading, and Matching in Retail Scene [37.1516435926562]
We introduce the United Retail dataset, a benchmark of basic visual tasks on products.
With 1.8M quadrilateral-shaped instances, the Unitail offers a detection dataset to align product appearance better.
It also provides a gallery-style OCR dataset containing 1454 product categories, 30k text regions, and 21k transcriptions.
arXiv Detail & Related papers (2022-04-01T09:06:48Z) - A Survey on RGB-D Datasets [69.73803123972297]
This paper reviewed and categorized image datasets that include depth information.
We gathered 203 datasets that contain accessible data and grouped them into three categories: scene/objects, body, and medical.
arXiv Detail & Related papers (2022-01-15T05:35:19Z) - eProduct: A Million-Scale Visual Search Benchmark to Address Product
Recognition Challenges [8.204924070199866]
eProduct is a benchmark dataset for training and evaluation on various visual search solutions in a real-world setting.
We present eProduct as a training set and an evaluation set, where the training set contains 1.3M+ listing images with titles and hierarchical category labels, for model development.
We will present eProduct's construction steps, provide analysis about its diversity and cover the performance of baseline models trained on it.
arXiv Detail & Related papers (2021-07-13T05:28:34Z) - FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in
High-Resolution Remote Sensing Imagery [21.9319970004788]
We propose a novel benchmark dataset with more than 1 million instances and more than 15,000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery.
All objects in the FAIR1M dataset are annotated with respect to 5 categories and 37 sub-categories by oriented bounding boxes.
arXiv Detail & Related papers (2021-03-09T17:20:15Z) - TAO: A Large-Scale Benchmark for Tracking Any Object [95.87310116010185]
Tracking Any Object dataset consists of 2,907 high resolution videos, captured in diverse environments, which are half a minute long on average.
We ask annotators to label objects that move at any point in the video, and give names to them post factum.
Our vocabulary is both significantly larger and qualitatively different from existing tracking datasets.
arXiv Detail & Related papers (2020-05-20T21:07:28Z) - Bag of Tricks for Retail Product Image Classification [0.0]
We present various tricks to increase accuracy of Deep Learning models on different types of retail product image classification datasets.
New neural network layer called Local-Concepts-Accumulation (LCA) layer gives consistent gains across multiple datasets.
Two other tricks we find to increase accuracy on retail product identification are using an instagram-pretrained Convnet and using Maximum Entropy as an auxiliary loss for classification.
arXiv Detail & Related papers (2020-01-12T20:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.