Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection
- URL: http://arxiv.org/abs/2407.15771v1
- Date: Mon, 22 Jul 2024 16:22:28 GMT
- Title: Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection
- Authors: Kangqi Ma, Hao Dong, Yadong Mu,
- Abstract summary: This paper addresses the challenge of robotic grasping of general objects.
The proposed model first runs by proposing a number of most likely grasp points in the scene.
Around each grasp point, a module is designed to infer any voxel in its neighborhood to be either void or occupied by some object.
The model further estimates 6-DoF grasp poses utilizing the local occupancy-enhanced object shape information.
- Score: 24.00828999360765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenge of robotic grasping of general objects. Similar to prior research, the task reads a single-view 3D observation (i.e., point clouds) captured by a depth camera as input. Crucially, the success of object grasping highly demands a comprehensive understanding of the shape of objects within the scene. However, single-view observations often suffer from occlusions (including both self and inter-object occlusions), which lead to gaps in the point clouds, especially in complex cluttered scenes. This renders incomplete perception of the object shape and frequently causes failures or inaccurate pose estimation during object grasping. In this paper, we tackle this issue with an effective albeit simple solution, namely completing grasping-related scene regions through local occupancy prediction. Following prior practice, the proposed model first runs by proposing a number of most likely grasp points in the scene. Around each grasp point, a module is designed to infer any voxel in its neighborhood to be either void or occupied by some object. Importantly, the occupancy map is inferred by fusing both local and global cues. We implement a multi-group tri-plane scheme for efficiently aggregating long-distance contextual information. The model further estimates 6-DoF grasp poses utilizing the local occupancy-enhanced object shape information and returns the top-ranked grasp proposal. Comprehensive experiments on both the large-scale GraspNet-1Billion benchmark and real robotic arm demonstrate that the proposed method can effectively complete the unobserved parts in cluttered and occluded scenes. Benefiting from the occupancy-enhanced feature, our model clearly outstrips other competing methods under various performance metrics such as grasping average precision.
Related papers
- Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors [8.701106353658346]
General Object-level mapping builds a 3D map of objects in a scene with detailed shapes and poses from multi-view sensor observations.
Recent work introduces generative shape priors for object-level mapping from sparse views, but is limited to single-category objects.
In this work, we propose a General Object-level Mapping system, GOM, which leverages a 3D diffusion model as shape prior with multi-category support and outputs Neural Radiance Fields (NeRFs) for both texture and geometry for all objects in a scene.
arXiv Detail & Related papers (2024-10-07T21:33:30Z) - DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses.
We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass.
Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z) - ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping.
We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point
Clouds [4.709764624933227]
We propose the first unsupervised method, called OGC, to simultaneously identify multiple 3D objects in a single forward pass.
We extensively evaluate our method on five datasets, demonstrating the superior performance for object part instance segmentation.
arXiv Detail & Related papers (2022-10-10T07:01:08Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - 3D Object Classification on Partial Point Clouds: A Practical
Perspective [91.81377258830703]
A point cloud is a popular shape representation adopted in 3D object classification.
This paper introduces a practical setting to classify partial point clouds of object instances under any poses.
A novel algorithm in an alignment-classification manner is proposed in this paper.
arXiv Detail & Related papers (2020-12-18T04:00:56Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Counting from Sky: A Large-scale Dataset for Remote Sensing Object
Counting and A Benchmark Method [52.182698295053264]
We are interested in counting dense objects from remote sensing images. Compared with object counting in a natural scene, this task is challenging in the following factors: large scale variation, complex cluttered background, and orientation arbitrariness.
To address these issues, we first construct a large-scale object counting dataset with remote sensing images, which contains four important geographic objects.
We then benchmark the dataset by designing a novel neural network that can generate a density map of an input image.
arXiv Detail & Related papers (2020-08-28T03:47:49Z) - Single View Metrology in the Wild [94.7005246862618]
We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground.
Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights.
We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion.
arXiv Detail & Related papers (2020-07-18T22:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.