Related papers: Retail-786k: a Large-Scale Dataset for Visual Entity Matching

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

URL: http://arxiv.org/abs/2309.17164v2
Date: Mon, 11 Mar 2024 15:11:11 GMT
Title: Retail-786k: a Large-Scale Dataset for Visual Entity Matching
Authors: Bianca Lamm (1 and 2), Janis Keuper (1) ((1) IMLA, Offenburg University, (2) Markant Services International GmbH)
Abstract summary: This paper introduces the first publicly available large-scale dataset for "visual entity matching" We provide a total of 786k manually annotated, high resolution product images containing 18k different individual retail products which are grouped into 3k entities. The proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Entity Matching (EM) defines the task of learning to group objects by transferring semantic concepts from example groups (=entities) to unseen data. Despite the general availability of image data in the context of many EM-problems, most currently available EM-algorithms solely rely on (textual) meta data. In this paper, we introduce the first publicly available large-scale dataset for "visual entity matching", based on a production level use case in the retail domain. Using scanned advertisement leaflets, collected over several years from different European retailers, we provide a total of ~786k manually annotated, high resolution product images containing ~18k different individual retail products which are grouped into ~3k entities. The annotation of these product entities is based on a price comparison task, where each entity forms an equivalence class of comparable products. Following on a first baseline evaluation, we show that the proposed "visual entity matching" constitutes a novel learning problem which can not sufficiently be solved using standard image based classification and retrieval algorithms. Instead, novel approaches which allow to transfer example based visual equivalent classes to new data are needed to address the proposed problem. The aim of this paper is to provide a benchmark for such algorithms. Information about the dataset, evaluation code and download instructions are provided under https://www.retail-786k.org/.

Related papers

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models [50.370043676415875]
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods. We introduce the MIMEX dataset, comprising 28 distinct product categories. We benchmark the zero-shot object classification performance of state-of-the-art vision-language models (VLMs) on the proposed MIMEX dataset.
arXiv Detail & Related papers (2024-09-23T12:28:40Z)
Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision. We construct a taxonomy and review the most prominent papers in recent years. We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z)
Unsupervised Collaborative Metric Learning with Mixed-Scale Groups for General Object Retrieval [28.810040080126324]
This paper presents a novel unsupervised deep metric learning approach, termed unsupervised collaborative metric learning with mixed-scale groups (MS-UGCML) We show that our approach can learn embeddings for objects of varying scales, with an object level and image level mAPs improvement of up to 6.69% and 10.03%, respectively.
arXiv Detail & Related papers (2024-03-16T04:01:50Z)
Text-Based Product Matching -- Semi-Supervised Clustering Approach [9.748519919202986]
This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach. We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset.
arXiv Detail & Related papers (2024-02-01T18:52:26Z)
Thinking Like an Annotator: Generation of Dataset Labeling Instructions [59.603239753484345]
We introduce a new task, Labeling Instruction Generation, to address missing publicly available labeling instructions. We take a reasonably annotated dataset and: 1) generate a set of examples that are visually representative of each category in the dataset; 2) provide a text label that corresponds to each of the examples. This framework acts as a proxy to human annotators that can help to both generate a final labeling instruction set and evaluate its quality.
arXiv Detail & Related papers (2023-06-24T18:32:48Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training. We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
Rethinking Object Detection in Retail Stores [55.359582952686175]
We propose a new task, simultaneously object localization and counting, abbreviated as Locount. Locount requires algorithms to localize groups of objects of interest with the number of instances. We collect a large-scale object localization and counting dataset with rich annotations in retail stores.
arXiv Detail & Related papers (2020-03-18T14:01:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.