SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images
- URL: http://arxiv.org/abs/2412.02140v1
- Date: Tue, 03 Dec 2024 03:56:01 GMT
- Title: SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images
- Authors: Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang, Xiangyang Xue, Yanwei Fu,
- Abstract summary: We propose SparseGrasp, a novel open-vocabulary robotic grasping system.
SparseGrasp operates efficiently with sparse-view RGB images and handles scene updates fastly.
We show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability.
- Score: 125.66499135980344
- License:
- Abstract: Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects. However, existing methods often depend on dense camera views and struggle to quickly update scenes, limiting their effectiveness in changeable environments. In contrast, we propose SparseGrasp, a novel open-vocabulary robotic grasping system that operates efficiently with sparse-view RGB images and handles scene updates fastly. Our system builds upon and significantly enhances existing computer vision modules in robotic learning. Specifically, SparseGrasp utilizes DUSt3R to generate a dense point cloud as the initialization for 3D Gaussian Splatting (3DGS), maintaining high fidelity even under sparse supervision. Importantly, SparseGrasp incorporates semantic awareness from recent vision foundation models. To further improve processing efficiency, we repurpose Principal Component Analysis (PCA) to compress features from 2D models. Additionally, we introduce a novel render-and-compare strategy that ensures rapid scene updates, enabling multi-turn grasping in changeable environments. Experimental results show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability, providing a robust solution for multi-turn grasping in changeable environment.
Related papers
- SparseLGS: Sparse View Language Embedded Gaussian Splatting [49.187761358726675]
We propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images.
Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs.
Experimental results show that SparseLGS achieves comparable quality when reconstructing semantic fields with fewer inputs.
arXiv Detail & Related papers (2024-12-03T08:18:56Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Memorize What Matters: Emergent Scene Decomposition from Multitraverse [54.487589469432706]
We introduce 3D Gaussian Mapping, a camera-only offline mapping framework grounded in 3D Gaussian Splatting.
3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation.
We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering.
arXiv Detail & Related papers (2024-05-27T14:11:17Z) - InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds [91.77050739918037]
We introduce InstantSplat, a novel and lightning-fast neural reconstruction system that builds accurate 3D representations from as few as 2-3 images.
InstantSplat integrates dense stereo priors and co-visibility relationships between frames to initialize pixel-aligned by progressively expanding the scene.
It achieves an acceleration of over 20 times in reconstruction, improves visual quality (SSIM) from 0.3755 to 0.7624 than COLMAP with 3D-GS, and is compatible with multiple 3D representations.
arXiv Detail & Related papers (2024-03-29T17:29:58Z) - BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives [6.431806897364565]
Implicit neural representations have become pivotal in robotic perception, enabling robots to comprehend 3D environments from 2D images.
We propose a framework called bundle-adjusting accelerated neural graphics primitives (BAA-NGP)
Results demonstrate 10 to 20 x speed improvement compared to other bundle-adjusting neural radiance field methods.
arXiv Detail & Related papers (2023-06-07T05:36:45Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Fast and Lightweight Scene Regressor for Camera Relocalization [1.6708069984516967]
Estimating the camera pose directly with respect to pre-built 3D models can be prohibitively expensive for several applications.
This study proposes a simple scene regression method that requires only a multi-layer perceptron network for mapping scene coordinates.
The proposed approach uses sparse descriptors to regress the scene coordinates, instead of a dense RGB image.
arXiv Detail & Related papers (2022-12-04T14:41:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.