CabiNet: Scaling Neural Collision Detection for Object Rearrangement
with Procedural Scene Generation
- URL: http://arxiv.org/abs/2304.09302v1
- Date: Tue, 18 Apr 2023 21:09:55 GMT
- Title: CabiNet: Scaling Neural Collision Detection for Object Rearrangement
with Procedural Scene Generation
- Authors: Adithyavairavan Murali, Arsalan Mousavian, Clemens Eppner, Adam
Fishman, Dieter Fox
- Abstract summary: We first generate over 650K cluttered scenes - orders of magnitude more than prior work - in diverse everyday environments.
We render synthetic partial point clouds from this data and use it to train our CabiNet model architecture.
CabiNet is a collision model that accepts object and scene point clouds, captured from a single-view depth observation.
- Score: 54.68738348071891
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the important problem of generalizing robotic rearrangement to
clutter without any explicit object models. We first generate over 650K
cluttered scenes - orders of magnitude more than prior work - in diverse
everyday environments, such as cabinets and shelves. We render synthetic
partial point clouds from this data and use it to train our CabiNet model
architecture. CabiNet is a collision model that accepts object and scene point
clouds, captured from a single-view depth observation, and predicts collisions
for SE(3) object poses in the scene. Our representation has a fast inference
speed of 7 microseconds per query with nearly 20% higher performance than
baseline approaches in challenging environments. We use this collision model in
conjunction with a Model Predictive Path Integral (MPPI) planner to generate
collision-free trajectories for picking and placing in clutter. CabiNet also
predicts waypoints, computed from the scene's signed distance field (SDF), that
allows the robot to navigate tight spaces during rearrangement. This improves
rearrangement performance by nearly 35% compared to baselines. We
systematically evaluate our approach, procedurally generate simulated
experiments, and demonstrate that our approach directly transfers to the real
world, despite training exclusively in simulation. Robot experiment demos in
completely unknown scenes and objects can be found at this http
https://cabinet-object-rearrangement.github.io
Related papers
- PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes.
Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects.
Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z) - Uncertainty-aware Active Learning of NeRF-based Object Models for Robot Manipulators using Visual and Re-orientation Actions [8.059133373836913]
This paper presents an approach that enables a robot to rapidly learn the complete 3D model of a given object for manipulation in unfamiliar orientations.
We use an ensemble of partially constructed NeRF models to quantify model uncertainty to determine the next action.
Our approach determines when and how to grasp and re-orient an object given its partial NeRF model and re-estimates the object pose to rectify misalignments introduced during the interaction.
arXiv Detail & Related papers (2024-04-02T10:15:06Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - COPILOT: Human-Environment Collision Prediction and Localization from
Egocentric Videos [62.34712951567793]
The ability to forecast human-environment collisions from egocentric observations is vital to enable collision avoidance in applications such as VR, AR, and wearable assistive robotics.
We introduce the challenging problem of predicting collisions in diverse environments from multi-view egocentric videos captured from body-mounted cameras.
We propose a transformer-based model called COPILOT to perform collision prediction and localization simultaneously.
arXiv Detail & Related papers (2022-10-04T17:49:23Z) - iSDF: Real-Time Neural Signed Distance Fields for Robot Perception [64.80458128766254]
iSDF is a continuous learning system for real-time signed distance field reconstruction.
It produces more accurate reconstructions and better approximations of collision costs and gradients.
arXiv Detail & Related papers (2022-04-05T15:48:39Z) - PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds [4.381579507834533]
3D scene understanding from point clouds plays a vital role for various robotic applications.
Current state-of-the-art methods use separate neural networks for different tasks like object detection or room layout estimation.
We propose the first transformer architecture that predicts 3D objects and layouts simultaneously.
arXiv Detail & Related papers (2021-09-12T17:31:59Z) - SIMstack: A Generative Shape and Instance Model for Unordered Object
Stacks [38.042876641457255]
We propose a depth-conditioned Variational Auto-Encoder (VAE) trained on a dataset of objects stacked under physics simulation.
We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn't require setting the maximum number of objects in the scene.
Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes.
arXiv Detail & Related papers (2021-03-30T15:42:43Z) - Object Rearrangement Using Learned Implicit Collision Functions [61.90305371998561]
We propose a learned collision model that accepts scene and query object point clouds and predicts collisions for 6DOF object poses within the scene.
We leverage the learned collision model as part of a model predictive path integral (MPPI) policy in a tabletop rearrangement task.
The learned model outperforms both traditional pipelines and learned ablations by 9.8% in accuracy on a dataset of simulated collision queries.
arXiv Detail & Related papers (2020-11-21T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.