RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features
- URL: http://arxiv.org/abs/2403.01731v1
- Date: Mon, 4 Mar 2024 05:03:24 GMT
- Title: RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features
- Authors: Howard H. Qian, Yangxiao Lu, Kejia Ren, Gaotian Wang, Ninad
Khargonkar, Yu Xiang, Kaiyu Hang
- Abstract summary: We introduce a novel approach to correct inaccurate segmentation by using robot interaction and a designed body frame-invariant feature.
We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%.
- Score: 6.358423536732677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to successfully perform manipulation tasks in new environments, such
as grasping, robots must be proficient in segmenting unseen objects from the
background and/or other objects. Previous works perform unseen object instance
segmentation (UOIS) by training deep neural networks on large-scale data to
learn RGB/RGB-D feature embeddings, where cluttered environments often result
in inaccurate segmentations. We build upon these methods and introduce a novel
approach to correct inaccurate segmentation, such as under-segmentation, of
static image-based UOIS masks by using robot interaction and a designed body
frame-invariant feature. We demonstrate that the relative linear and rotational
velocities of frames randomly attached to rigid bodies due to robot
interactions can be used to identify objects and accumulate corrected
object-level segmentation masks. By introducing motion to regions of
segmentation uncertainty, we are able to drastically improve segmentation
accuracy in an uncertainty-driven manner with minimal, non-disruptive
interactions (ca. 2-3 per scene). We demonstrate the effectiveness of our
proposed interactive perception pipeline in accurately segmenting cluttered
scenes by achieving an average object segmentation accuracy rate of 80.7%, an
increase of 28.2% when compared with other state-of-the-art UOIS methods.
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Embodied Uncertainty-Aware Object Segmentation [38.52448300879023]
We introduce uncertainty-aware object instance segmentation (UncOS) and demonstrate its usefulness for embodied interactive segmentation.
We obtain a set of region-factored segmentation hypotheses together with confidence estimates by making multiple queries of large pre-trained models.
The output can also serve as input to a belief-driven process for selecting robot actions to perturb the scene to reduce ambiguity.
arXiv Detail & Related papers (2024-08-08T21:29:22Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot
Interaction [23.572104156617844]
We introduce a novel robotic system for improving unseen object instance segmentation in the real world by leveraging long-term robot interaction with objects.
Our system defers the decision on segmenting objects after a sequence of robot pushing actions.
We demonstrate the usefulness of our system by fine-tuning segmentation networks trained on synthetic data with real-world data collected by our system.
arXiv Detail & Related papers (2023-02-07T23:11:29Z) - Self-supervised Pre-training for Semantic Segmentation in an Indoor
Scene [8.357801312689622]
We propose RegConsist, a method for self-supervised pre-training of a semantic segmentation model.
We use a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the target environment.
The proposed method outperforms models pre-trained on ImageNet and achieves competitive performance when using models that are trained for exactly the same task but on a different dataset.
arXiv Detail & Related papers (2022-10-04T20:10:14Z) - Self-Supervised Interactive Object Segmentation Through a
Singulation-and-Grasping Approach [9.029861710944704]
We propose a robot learning approach to interact with novel objects and collect each object's training label.
The Singulation-and-Grasping (SaG) policy is trained through end-to-end reinforcement learning.
Our system achieves 70% singulation success rate in simulated cluttered scenes.
arXiv Detail & Related papers (2022-07-19T15:01:36Z) - Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios.
We re-emphasize the adaptation process across Sim2Real domains in this paper.
We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Learning RGB-D Feature Embeddings for Unseen Object Instance
Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings.
We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z) - Unseen Object Instance Segmentation for Robotic Environments [67.88276573341734]
We propose a method to segment unseen object instances in tabletop environments.
UOIS-Net is comprised of two stages: first, it operates only on depth to produce object instance center votes in 2D or 3D.
Surprisingly, our framework is able to learn from synthetic RGB-D data where the RGB is non-photorealistic.
arXiv Detail & Related papers (2020-07-16T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.