Self-Supervised Interactive Object Segmentation Through a
Singulation-and-Grasping Approach
- URL: http://arxiv.org/abs/2207.09314v2
- Date: Wed, 20 Jul 2022 18:24:47 GMT
- Title: Self-Supervised Interactive Object Segmentation Through a
Singulation-and-Grasping Approach
- Authors: Houjian Yu and Changhyun Choi
- Abstract summary: We propose a robot learning approach to interact with novel objects and collect each object's training label.
The Singulation-and-Grasping (SaG) policy is trained through end-to-end reinforcement learning.
Our system achieves 70% singulation success rate in simulated cluttered scenes.
- Score: 9.029861710944704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instance segmentation with unseen objects is a challenging problem in
unstructured environments. To solve this problem, we propose a robot learning
approach to actively interact with novel objects and collect each object's
training label for further fine-tuning to improve the segmentation model
performance, while avoiding the time-consuming process of manually labeling a
dataset. The Singulation-and-Grasping (SaG) policy is trained through
end-to-end reinforcement learning. Given a cluttered pile of objects, our
approach chooses pushing and grasping motions to break the clutter and conducts
object-agnostic grasping for which the SaG policy takes as input the visual
observations and imperfect segmentation. We decompose the problem into three
subtasks: (1) the object singulation subtask aims to separate the objects from
each other, which creates more space that alleviates the difficulty of (2) the
collision-free grasping subtask; (3) the mask generation subtask to obtain the
self-labeled ground truth masks by using an optical flow-based binary
classifier and motion cue post-processing for transfer learning. Our system
achieves 70% singulation success rate in simulated cluttered scenes. The
interactive segmentation of our system achieves 87.8%, 73.9%, and 69.3% average
precision for toy blocks, YCB objects in simulation and real-world novel
objects, respectively, which outperforms several baselines.
Related papers
- Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries.
We show that the proposed method set a new state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features [6.358423536732677]
We introduce a novel approach to correct inaccurate segmentation by using robot interaction and a designed body frame-invariant feature.
We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%.
arXiv Detail & Related papers (2024-03-04T05:03:24Z) - AGILE: Approach-based Grasp Inference Learned from Element Decomposition [2.812395851874055]
Humans can grasp objects by taking into account hand-object positioning information.
This work proposes a method to enable a robot manipulator to learn the same, grasping objects in the most optimal way.
arXiv Detail & Related papers (2024-02-02T10:47:08Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo
Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments.
It is hard to distinguish concealed objects from the background due to the intrinsic similarity.
We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z) - FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection [4.534713782093219]
A novel end-to-end transformer-based framework (FGAHOI) is proposed to alleviate the above problems.
FGAHOI comprises three dedicated components namely, multi-scale sampling (MSS), hierarchical spatial-aware merging (HSAM) and task-aware merging mechanism (TAM)
arXiv Detail & Related papers (2023-01-08T03:53:50Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - Instance Segmentation of Visible and Occluded Regions for Finding and
Picking Target from a Pile of Objects [25.836334764387498]
We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object.
We extend an existing instance segmentation model with a novel relook' architecture, in which the model explicitly learns the inter-instance relationship.
Also, by using image synthesis, we make the system capable of handling new objects without human annotations.
arXiv Detail & Related papers (2020-01-21T12:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.