SKU-Patch: Towards Efficient Instance Segmentation for Unseen Objects in
Auto-Store
- URL: http://arxiv.org/abs/2311.04645v1
- Date: Wed, 8 Nov 2023 12:44:38 GMT
- Title: SKU-Patch: Towards Efficient Instance Segmentation for Unseen Objects in
Auto-Store
- Authors: Biqi Yang, Weiliang Tang, Xiaojie Gao, Xianzhi Li, Yun-Hui Liu,
Chi-Wing Fu, Pheng-Ann Heng
- Abstract summary: In large-scale storehouses, precise instance masks are crucial for robotic bin picking.
This paper presents a new patch-guided instance segmentation solution, leveraging only a few image patches for each incoming new SKU.
SKU-Patch yields an average of nearly 100% grasping success rate on more than 50 unseen SKUs in a robot-aided auto-store logistic pipeline.
- Score: 102.45729472142526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In large-scale storehouses, precise instance masks are crucial for robotic
bin picking but are challenging to obtain. Existing instance segmentation
methods typically rely on a tedious process of scene collection, mask
annotation, and network fine-tuning for every single Stock Keeping Unit (SKU).
This paper presents SKU-Patch, a new patch-guided instance segmentation
solution, leveraging only a few image patches for each incoming new SKU to
predict accurate and robust masks, without tedious manual effort and model
re-training. Technical-wise, we design a novel transformer-based network with
(i) a patch-image correlation encoder to capture multi-level image features
calibrated by patch information and (ii) a patch-aware transformer decoder with
parallel task heads to generate instance masks. Extensive experiments on four
storehouse benchmarks manifest that SKU-Patch is able to achieve the best
performance over the state-of-the-art methods. Also, SKU-Patch yields an
average of nearly 100% grasping success rate on more than 50 unseen SKUs in a
robot-aided auto-store logistic pipeline, showing its effectiveness and
practicality.
Related papers
- SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images [3.2099042811875833]
We propose a strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images.
We leverage few-shot embeddings derived from a limited set of labeled images as prompts for anatomical querying objects captured in image embeddings.
Our method prioritizes the efficiency of the fine-tuning process by exclusively training the mask decoder through caching mechanisms.
arXiv Detail & Related papers (2024-07-05T17:07:25Z) - Mask Propagation for Efficient Video Semantic Segmentation [63.09523058489429]
Video Semantic baseline degradation (VSS) involves assigning a semantic label to each pixel in a video sequence.
We propose an efficient mask propagation framework for VSS, called SSSS.
Our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former with only up to 2% mIoU on the Cityscapes validation set.
arXiv Detail & Related papers (2023-10-29T09:55:28Z) - Fast Training of Diffusion Models with Masked Transformers [107.77340216247516]
We propose an efficient approach to train large diffusion models with masked transformers.
Specifically, we randomly mask out a high proportion of patches in diffused input images during training.
Experiments on ImageNet-256x256 and ImageNet-512x512 show that our approach achieves competitive and even better generative performance than the state-of-the-art Diffusion Transformer (DiT) model.
arXiv Detail & Related papers (2023-06-15T17:38:48Z) - Enhancing Few-shot Image Classification with Cosine Transformer [4.511561231517167]
Few-shot Cosine Transformer (FS-CT) is a relational map between supports and queries.
Our method performs competitive results in mini-ImageNet, CUB-200, and CIFAR-FS on 1-shot learning and 5-shot learning tasks.
Our FS-CT with cosine attention is a lightweight, simple few-shot algorithm that can be applied for a wide range of applications.
arXiv Detail & Related papers (2022-11-13T06:03:28Z) - SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
Self-distillated masked AutoEncoder network SdAE is proposed in this paper.
With only 300 epochs pre-training, a vanilla ViT-Base model achieves an 84.1% fine-tuning accuracy on ImageNet-1k classification.
arXiv Detail & Related papers (2022-07-31T15:07:25Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - SOIT: Segmenting Objects with Instance-Aware Transformers [16.234574932216855]
This paper presents an end-to-end instance segmentation framework, termed SOIT, that Segments Objects with Instance-aware Transformers.
Inspired by DETR citecarion 2020end, our method views instance segmentation as a direct set prediction problem.
Experimental results on the MS COCO dataset demonstrate that SOIT outperforms state-of-the-art instance segmentation approaches significantly.
arXiv Detail & Related papers (2021-12-21T08:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.