Related papers: Training with Product Digital Twins for AutoRetail Checkout

Training with Product Digital Twins for AutoRetail Checkout

URL: http://arxiv.org/abs/2308.09708v1
Date: Fri, 18 Aug 2023 17:58:10 GMT
Title: Training with Product Digital Twins for AutoRetail Checkout
Authors: Yue Yao, Xinyu Tian, Zheng Tang, Sujit Biswas, Huan Lei, Tom Gedeon, Liang Zheng
Abstract summary: We propose a training data optimization framework, i.e., training with digital twins (DtTrain) These digital twins, inherit product labels and, when augmented, form the Digital Twin training set (DT set) In our experiment, we show that DT set outperforms training sets created by existing dataset synthesis methods in terms of counting accuracy.
Score: 28.823850493539293
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automating the checkout process is important in smart retail, where users effortlessly pass products by hand through a camera, triggering automatic product detection, tracking, and counting. In this emerging area, due to the lack of annotated training data, we introduce a dataset comprised of product 3D models, which allows for fast, flexible, and large-scale training data generation through graphic engine rendering. Within this context, we discern an intriguing facet, because of the user "hands-on" approach, bias in user behavior leads to distinct patterns in the real checkout process. The existence of such patterns would compromise training effectiveness if training data fail to reflect the same. To address this user bias problem, we propose a training data optimization framework, i.e., training with digital twins (DtTrain). Specifically, we leverage the product 3D models and optimize their rendering viewpoint and illumination to generate "digital twins" that visually resemble representative user images. These digital twins, inherit product labels and, when augmented, form the Digital Twin training set (DT set). Because the digital twins individually mimic user bias, the resulting DT training set better reflects the characteristics of the target scenario and allows us to train more effective product detection and tracking models. In our experiment, we show that DT set outperforms training sets created by existing dataset synthesis methods in terms of counting accuracy. Moreover, by combining DT set with pseudo-labeled real checkout data, further improvement is observed. The code is available at https://github.com/yorkeyao/Automated-Retail-Checkout.

Related papers

Learning to Track Any Points from Human Motion [55.831218129679144]
We propose an automated pipeline to generate pseudo-labeled training data for point tracking.<n>A point tracking model trained on AnthroTAP achieves annotated state-of-the-art performance on the TAP-Vid benchmark.
arXiv Detail & Related papers (2025-07-08T17:59:58Z)
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving [46.24100810736637]
We introduce a self-supervised pre-training framework that learns effective 3D representations from scratch on unlabeled data. The approach significantly improves model performance on downstream tasks such as 3D object detection, BEV segmentation, 3D object tracking, and occupancy prediction.
arXiv Detail & Related papers (2025-04-17T07:26:11Z)
TrajSSL: Trajectory-Enhanced Semi-Supervised 3D Object Detection [59.498894868956306]
Pseudo-labeling approaches to semi-supervised learning adopt a teacher-student framework. We leverage pre-trained motion-forecasting models to generate object trajectories on pseudo-labeled data. Our approach improves pseudo-label quality in two distinct manners.
arXiv Detail & Related papers (2024-09-17T05:35:00Z)
Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection [52.66283064389691]
State-of-the-art 3D object detectors are often trained on massive labeled datasets. Recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
arXiv Detail & Related papers (2024-06-14T15:21:57Z)
Enhancing 2D Representation Learning with a 3D Prior [21.523007105586217]
Learning robust and effective representations of visual data is a fundamental task in computer vision. Traditionally, this is achieved by training models with labeled data which can be expensive to obtain. We propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural.
arXiv Detail & Related papers (2024-06-04T17:55:22Z)
An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings. We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP. We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z)
TDT: Teaching Detectors to Track without Fully Annotated Videos [2.8292841621378844]
One-stage trackers that predict both detections and appearance embeddings in one forward pass received much attention. Our proposed one-stage solution matches the two-stage counterpart in quality but is 3 times faster.
arXiv Detail & Related papers (2022-05-11T15:56:17Z)
Supervised Contrastive Learning for Product Matching [2.28438857884398]
This poster is the first work that applies contrastive learning to the task of product matching in e-commerce. We employ a supervised contrastive learning technique to pre-train a Transformer encoder which is afterwards fine-tuned for the matching problem. We propose a source-aware sampling strategy which enables contrastive learning to be applied for use cases in which the training data does not contain product idenifiers.
arXiv Detail & Related papers (2022-02-04T12:16:38Z)
Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model. MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z)
Self-Supervised Pretraining of 3D Features on any Point-Cloud [40.26575888582241]
We present a simple self-supervised pertaining method that can work with any 3D data without 3D registration. We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining.
arXiv Detail & Related papers (2021-01-07T18:55:21Z)
Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors. We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations. Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.