GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and
Specular Objects Using Generalizable NeRF
- URL: http://arxiv.org/abs/2210.06575v1
- Date: Wed, 12 Oct 2022 20:31:23 GMT
- Title: GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and
Specular Objects Using Generalizable NeRF
- Authors: Qiyu Dai, Yan Zhu, Yiran Geng, Ciyu Ruan, Jiazhao Zhang, He Wang
- Abstract summary: We propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, to achieve material-agnostic object grasping in clutter.
Compared to the existing NeRF-based 3-DoF grasp detection methods, our system can perform zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF grasps, both in real-time.
For training data, we generate a large-scale photorealistic domain-randomized synthetic dataset of grasping in cluttered tabletop scenes.
- Score: 7.47805672405939
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we tackle 6-DoF grasp detection for transparent and specular
objects, which is an important yet challenging problem in vision-based robotic
systems, due to the failure of depth cameras in sensing their geometry. We, for
the first time, propose a multiview RGB-based 6-DoF grasp detection network,
GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to
achieve material-agnostic object grasping in clutter. Compared to the existing
NeRF-based 3-DoF grasp detection methods that rely on densely captured input
images and time-consuming per-scene optimization, our system can perform
zero-shot NeRF construction with sparse RGB inputs and reliably detect 6-DoF
grasps, both in real-time. The proposed framework jointly learns generalizable
NeRF and grasp detection in an end-to-end manner, optimizing the scene
representation construction for the grasping. For training data, we generate a
large-scale photorealistic domain-randomized synthetic dataset of grasping in
cluttered tabletop scenes that enables direct transfer to the real world. Our
extensive experiments in synthetic and real-world environments demonstrate that
our method significantly outperforms all the baselines in all the experiments
while remaining in real-time.
Related papers
- NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections [57.63028964831785]
Recent works have improved NeRF's ability to render detailed specular appearance of distant environment illumination, but are unable to synthesize consistent reflections of closer content.
We address these issues with an approach based on ray tracing.
Instead of querying an expensive neural network for the outgoing view-dependent radiance at points along each camera ray, our model casts rays from these points and traces them through the NeRF representation to render feature vectors.
arXiv Detail & Related papers (2024-05-23T17:59:57Z) - ASGrasp: Generalizable Transparent Object Reconstruction and Grasping from RGB-D Active Stereo Camera [9.212504138203222]
We propose ASGrasp, a 6-DoF grasp detection network that uses an RGB-D active stereo camera.
Our system distinguishes itself by its ability to directly utilize raw IR and RGB images for transparent object geometry reconstruction.
Our experiments demonstrate that ASGrasp can achieve over 90% success rate for generalizable transparent object grasping.
arXiv Detail & Related papers (2024-05-09T09:44:51Z) - SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects [7.529049797077149]
Acquiring accurate depth information of transparent objects using off-the-shelf RGB-D cameras is a well-known challenge in Computer Vision and Robotics.
NeRFs are learning-free approaches and have demonstrated wide success in novel view synthesis and shape recovery.
Our proposed method-AID-NeRF shows significant performance on depth completion datasets for transparent objects and robotic grasping.
arXiv Detail & Related papers (2024-03-28T17:28:32Z) - Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation [51.346733271166926]
Mesh2NeRF is an approach to derive ground-truth radiance fields from textured meshes for 3D generation tasks.
We validate the effectiveness of Mesh2NeRF across various tasks.
arXiv Detail & Related papers (2024-03-28T11:22:53Z) - Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs [59.12526668734703]
We introduce Composable Object Volume NeRF (COV-NeRF), an object-composable NeRF model that is the centerpiece of a real-to-sim pipeline.
COV-NeRF extracts objects from real images and composes them into new scenes, generating photorealistic renderings and many types of 2D and 3D supervision.
arXiv Detail & Related papers (2024-03-07T00:00:02Z) - STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects
on Production Lines [9.258345770382688]
STOPNet is a framework for 6-DoF object suction detection on production lines.
We propose a novel framework to reconstruct the scene on the production line depending only on RGB input, based on multiview stereo.
Our method generalizes to novel environments, novel arrangements and novel objects, both in simulation and the real world.
arXiv Detail & Related papers (2023-10-09T13:39:06Z) - CLONeR: Camera-Lidar Fusion for Occupancy Grid-aided Neural
Representations [77.90883737693325]
This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes observed from sparse input sensor views.
This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively.
In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for rendering in metric space.
arXiv Detail & Related papers (2022-09-02T17:44:50Z) - Fast Fourier Convolution Based Remote Sensor Image Object Detection for
Earth Observation [0.0]
We propose a Frequency-aware Feature Pyramid Framework (FFPF) for remote sensing object detection.
F-ResNet is proposed to perceive the spectral context information by plugging the frequency domain convolution into each stage of the backbone.
The BSFPN is designed to use a bilateral sampling strategy and skipping connection to better model the association of object features at different scales.
arXiv Detail & Related papers (2022-09-01T15:50:58Z) - NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance
Fields [54.27264716713327]
We show that a Neural Radiance Fields (NeRF) representation of a scene can be used to train dense object descriptors.
We use an optimized NeRF to extract dense correspondences between multiple views of an object, and then use these correspondences as training data for learning a view-invariant representation of the object.
Dense correspondence models supervised with our method significantly outperform off-the-shelf learned descriptors by 106%.
arXiv Detail & Related papers (2022-03-03T18:49:57Z) - iNeRF: Inverting Neural Radiance Fields for Pose Estimation [68.91325516370013]
We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF)
NeRFs have been shown to be remarkably effective for the task of view synthesis.
arXiv Detail & Related papers (2020-12-10T18:36:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.