STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for
Warehouse Picking Robots
- URL: http://arxiv.org/abs/2311.02337v1
- Date: Sat, 4 Nov 2023 06:52:38 GMT
- Title: STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for
Warehouse Picking Robots
- Authors: Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox
- Abstract summary: We propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module.
The experiments we conduct show that our approach significantly outperforms recent methods.
- Score: 41.017649190833076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmentation and tracking of unseen object instances in discrete frames pose
a significant challenge in dynamic industrial robotic contexts, such as
distribution warehouses. Here, robots must handle object rearrangement,
including shifting, removal, and partial occlusion by new items, and track
these items after substantial temporal gaps. The task is further complicated
when robots encounter objects not learned in their training sets, which
requires the ability to segment and track previously unseen items. Considering
that continuous observation is often inaccessible in such settings, our task
involves working with a discrete set of frames separated by indefinite periods
during which substantial changes to the scene may occur. This task also
translates to domestic robotic applications, such as rearrangement of objects
on a table. To address these demanding challenges, we introduce new synthetic
and real-world datasets that replicate these industrial and household
scenarios. We also propose a novel paradigm for joint segmentation and tracking
in discrete frames along with a transformer module that facilitates efficient
inter-frame communication. The experiments we conduct show that our approach
significantly outperforms recent methods. For additional results and videos,
please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code
and dataset will be released.
Related papers
- RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features [6.358423536732677]
We introduce a novel approach to correct inaccurate segmentation by using robot interaction and a designed body frame-invariant feature.
We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%.
arXiv Detail & Related papers (2024-03-04T05:03:24Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Tracking through Containers and Occluders in the Wild [32.86030395660071]
We introduce $textbfTCOW$, a new benchmark and model for visual tracking through heavy occlusion and containment.
We create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance.
We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.
arXiv Detail & Related papers (2023-05-04T17:59:58Z) - Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot
Interaction [23.572104156617844]
We introduce a novel robotic system for improving unseen object instance segmentation in the real world by leveraging long-term robot interaction with objects.
Our system defers the decision on segmenting objects after a sequence of robot pushing actions.
We demonstrate the usefulness of our system by fine-tuning segmentation networks trained on synthetic data with real-world data collected by our system.
arXiv Detail & Related papers (2023-02-07T23:11:29Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Semantically Grounded Object Matching for Robust Robotic Scene
Rearrangement [21.736603698556042]
We present a novel approach to object matching that uses a large pre-trained vision-language model to match objects in a cross-instance setting.
We demonstrate that this provides considerably improved matching performance in cross-instance settings.
arXiv Detail & Related papers (2021-11-15T18:39:43Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - TrackFormer: Multi-Object Tracking with Transformers [92.25832593088421]
TrackFormer is an end-to-end multi-object tracking and segmentation model based on an encoder-decoder Transformer architecture.
New track queries are spawned by the DETR object detector and embed the position of their corresponding object over time.
TrackFormer achieves a seamless data association between frames in a new tracking-by-attention paradigm.
arXiv Detail & Related papers (2021-01-07T18:59:29Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z) - Instance Segmentation of Visible and Occluded Regions for Finding and
Picking Target from a Pile of Objects [25.836334764387498]
We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object.
We extend an existing instance segmentation model with a novel relook' architecture, in which the model explicitly learns the inter-instance relationship.
Also, by using image synthesis, we make the system capable of handling new objects without human annotations.
arXiv Detail & Related papers (2020-01-21T12:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.