Training with Product Digital Twins for AutoRetail Checkout
- URL: http://arxiv.org/abs/2308.09708v1
- Date: Fri, 18 Aug 2023 17:58:10 GMT
- Title: Training with Product Digital Twins for AutoRetail Checkout
- Authors: Yue Yao, Xinyu Tian, Zheng Tang, Sujit Biswas, Huan Lei, Tom Gedeon,
Liang Zheng
- Abstract summary: We propose a training data optimization framework, i.e., training with digital twins (DtTrain)
These digital twins, inherit product labels and, when augmented, form the Digital Twin training set (DT set)
In our experiment, we show that DT set outperforms training sets created by existing dataset synthesis methods in terms of counting accuracy.
- Score: 28.823850493539293
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automating the checkout process is important in smart retail, where users
effortlessly pass products by hand through a camera, triggering automatic
product detection, tracking, and counting. In this emerging area, due to the
lack of annotated training data, we introduce a dataset comprised of product 3D
models, which allows for fast, flexible, and large-scale training data
generation through graphic engine rendering. Within this context, we discern an
intriguing facet, because of the user "hands-on" approach, bias in user
behavior leads to distinct patterns in the real checkout process. The existence
of such patterns would compromise training effectiveness if training data fail
to reflect the same. To address this user bias problem, we propose a training
data optimization framework, i.e., training with digital twins (DtTrain).
Specifically, we leverage the product 3D models and optimize their rendering
viewpoint and illumination to generate "digital twins" that visually resemble
representative user images. These digital twins, inherit product labels and,
when augmented, form the Digital Twin training set (DT set). Because the
digital twins individually mimic user bias, the resulting DT training set
better reflects the characteristics of the target scenario and allows us to
train more effective product detection and tracking models. In our experiment,
we show that DT set outperforms training sets created by existing dataset
synthesis methods in terms of counting accuracy. Moreover, by combining DT set
with pseudo-labeled real checkout data, further improvement is observed. The
code is available at https://github.com/yorkeyao/Automated-Retail-Checkout.
Related papers
- Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection [52.66283064389691]
We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
We show that image-based shelf-supervision is helpful for training LiDAR-only and multi-modal (RGB + LiDAR) detectors.
arXiv Detail & Related papers (2024-06-14T15:21:57Z) - Enhancing 2D Representation Learning with a 3D Prior [21.523007105586217]
Learning robust and effective representations of visual data is a fundamental task in computer vision.
Traditionally, this is achieved by training models with labeled data which can be expensive to obtain.
We propose a new approach for strengthening existing self-supervised methods by explicitly enforcing a strong 3D structural.
arXiv Detail & Related papers (2024-06-04T17:55:22Z) - Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training [44.790636524264]
Point Prompt Training is a novel framework for multi-dataset synergistic learning in the context of 3D representation learning.
It can overcome the negative transfer associated with synergistic learning and produce generalizable representations.
It achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training.
arXiv Detail & Related papers (2023-08-18T17:59:57Z) - An Empirical Study of Pseudo-Labeling for Image-based 3D Object
Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings.
We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP.
We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z) - TDT: Teaching Detectors to Track without Fully Annotated Videos [2.8292841621378844]
One-stage trackers that predict both detections and appearance embeddings in one forward pass received much attention.
Our proposed one-stage solution matches the two-stage counterpart in quality but is 3 times faster.
arXiv Detail & Related papers (2022-05-11T15:56:17Z) - Supervised Contrastive Learning for Product Matching [2.28438857884398]
This poster is the first work that applies contrastive learning to the task of product matching in e-commerce.
We employ a supervised contrastive learning technique to pre-train a Transformer encoder which is afterwards fine-tuned for the matching problem.
We propose a source-aware sampling strategy which enables contrastive learning to be applied for use cases in which the training data does not contain product idenifiers.
arXiv Detail & Related papers (2022-02-04T12:16:38Z) - Multi-Task Self-Training for Learning General Representations [97.01728635294879]
Multi-task self-training (MuST) harnesses the knowledge in independent specialized teacher models to train a single general student model.
MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets.
arXiv Detail & Related papers (2021-08-25T17:20:50Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Self-Supervised Pretraining of 3D Features on any Point-Cloud [40.26575888582241]
We present a simple self-supervised pertaining method that can work with any 3D data without 3D registration.
We evaluate our models on 9 benchmarks for object detection, semantic segmentation, and object classification, where they achieve state-of-the-art results and can outperform supervised pretraining.
arXiv Detail & Related papers (2021-01-07T18:55:21Z) - Self-Supervised Person Detection in 2D Range Data using a Calibrated
Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors.
We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations.
Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.