STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects
on Production Lines
- URL: http://arxiv.org/abs/2310.05717v1
- Date: Mon, 9 Oct 2023 13:39:06 GMT
- Title: STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects
on Production Lines
- Authors: Yuxuan Kuang, Qin Han, Danshi Li, Qiyu Dai, Lian Ding, Dong Sun,
Hanlin Zhao, He Wang
- Abstract summary: STOPNet is a framework for 6-DoF object suction detection on production lines.
We propose a novel framework to reconstruct the scene on the production line depending only on RGB input, based on multiview stereo.
Our method generalizes to novel environments, novel arrangements and novel objects, both in simulation and the real world.
- Score: 9.258345770382688
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present STOPNet, a framework for 6-DoF object suction
detection on production lines, with a focus on but not limited to transparent
objects, which is an important and challenging problem in robotic systems and
modern industry. Current methods requiring depth input fail on transparent
objects due to depth cameras' deficiency in sensing their geometry, while we
proposed a novel framework to reconstruct the scene on the production line
depending only on RGB input, based on multiview stereo. Compared to existing
works, our method not only reconstructs the whole 3D scene in order to obtain
high-quality 6-DoF suction poses in real time but also generalizes to novel
environments, novel arrangements and novel objects, including challenging
transparent objects, both in simulation and the real world. Extensive
experiments in simulation and the real world show that our method significantly
surpasses the baselines and has better generalizability, which caters to
practical industrial needs.
Related papers
- ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation [18.140839442955485]
We develop a vision transformer-based algorithm for stereo depth recovery of transparent objects.
Our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation.
Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios.
arXiv Detail & Related papers (2024-09-13T15:44:38Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs [59.12526668734703]
We introduce Composable Object Volume NeRF (COV-NeRF), an object-composable NeRF model that is the centerpiece of a real-to-sim pipeline.
COV-NeRF extracts objects from real images and composes them into new scenes, generating photorealistic renderings and many types of 2D and 3D supervision.
arXiv Detail & Related papers (2024-03-07T00:00:02Z) - Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical
Flow with Monocular Depth Completion Prior [14.049778178534588]
We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms.
Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors to a cross-view consistent 3D reconstruction.
Our pipeline significantly outperforms baseline methods in 3D reconstruction quality.
arXiv Detail & Related papers (2023-10-15T21:30:06Z) - Self-supervised novel 2D view synthesis of large-scale scenes with
efficient multi-scale voxel carving [77.07589573960436]
We introduce an efficient multi-scale voxel carving method to generate novel views of real scenes.
Our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module.
We demonstrate the effectiveness of our method on highly complex and large-scale scenes in real environments.
arXiv Detail & Related papers (2023-06-26T13:57:05Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and
Specular Objects Using Generalizable NeRF [7.47805672405939]
We propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, to achieve material-agnostic object grasping in clutter.
Compared to the existing NeRF-based 3-DoF grasp detection methods, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time.
For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes.
arXiv Detail & Related papers (2022-10-12T20:31:23Z) - SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic
Data via Stereo [4.317104502755003]
SimNet is trained as a single multi-headed neural network using simulated stereo data.
SimNet is evaluated on 2D car detection, unknown object detection, and deformable object keypoint detection.
By inferring grasp positions using the OBB and keypoint predictions, SimNet can be used to perform end-to-end manipulation of unknown objects.
arXiv Detail & Related papers (2021-06-30T15:18:14Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras.
A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes.
In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.