Related papers: GraspSplats: Efficient Manipulation with 3D Feature Splatting

GraspSplats: Efficient Manipulation with 3D Feature Splatting

URL: http://arxiv.org/abs/2409.02084v1
Date: Tue, 3 Sep 2024 17:35:48 GMT
Title: GraspSplats: Efficient Manipulation with 3D Feature Splatting
Authors: Mazeyu Ji, Ri-Zhao Qiu, Xueyan Zou, Xiaolong Wang,
Abstract summary: We present GraspSplats, which generates high-quality scene representations in under 60 seconds. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods.
Score: 13.654484429008964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to their implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats generates high-quality scene representations in under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods.

Related papers

VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction [46.31516096522758]
Recent advancements in camera-based occupancy prediction have focused on the simultaneous prediction of 3D semantics and scene flow.<n>We propose a novel regularization framework called VoxelSplat to address these challenges and their underlying causes.<n>Our framework uses the predicted scene flow to model the motion of Gaussians, and is thus able to learn the scene flow of moving objects in a self-supervised manner.
arXiv Detail & Related papers (2025-06-05T20:19:35Z)
GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field [18.520468059548865]
GSFF-SLAM is a novel dense semantic SLAM system based on 3D Gaussian Splatting. Our method supports semantic reconstruction using various forms of 2D priors, particularly sparse and noisy signals. When utilizing 2D ground truth priors, GSFF-SLAM achieves state-of-the-art semantic segmentation performance with 95.03% mIoU.
arXiv Detail & Related papers (2025-04-28T01:21:35Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering [1.42869709275202]
We present CrowdSplat, a novel approach that leverages 3D Gaussian Splatting for real-time, high-quality crowd rendering. CrowdSplat is a viable solution for dynamic, realistic crowd simulation in real-time applications.
arXiv Detail & Related papers (2025-01-29T17:31:46Z)
Occam's LGS: An Efficient Approach for Language Gaussian Splatting [57.00354758206751]
We show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary. We apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique.
arXiv Detail & Related papers (2024-12-02T18:50:37Z)
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
3D Gaussian Splatting (3DGS) allows for the compact encoding of both 3D geometry and scene appearance with its spatial features. We propose distilling dense keypoint descriptors into 3DGS to improve the model's spatial understanding. Our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.
arXiv Detail & Related papers (2024-09-24T23:18:32Z)
SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality [50.179377002092416]
We propose an efficient visual localization method capable of high-quality rendering with fewer parameters. Our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.
arXiv Detail & Related papers (2024-09-21T08:46:16Z)
SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting [44.42317312908314]
3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds. Current methods require highly controlled environments to meet the inter-view consistency assumption of 3DGS. We present SpotLessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors.
arXiv Detail & Related papers (2024-06-28T17:07:11Z)
LP-3DGS: Learning to Prune 3D Gaussian Splatting [71.97762528812187]
We propose learning-to-prune 3DGS, where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically. Experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.
arXiv Detail & Related papers (2024-05-29T05:58:34Z)
Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting [27.974762304763694]
We introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features into a novel semantic component of 3D Gaussians. We build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference.
arXiv Detail & Related papers (2024-03-22T21:28:19Z)
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields [54.482261428543985]
Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. 3D Gaussian splatting has shown state-of-the-art performance on real-time radiance field rendering. We propose architectural and training changes to efficiently avert this problem.
arXiv Detail & Related papers (2023-12-06T00:46:30Z)
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering [71.44349029439944]
Recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering.
arXiv Detail & Related papers (2023-11-30T17:58:57Z)
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting [51.96353586773191]
We introduce textbfGS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping system. Our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets.
arXiv Detail & Related papers (2023-11-20T12:08:23Z)
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation. For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.