Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
- URL: http://arxiv.org/abs/2403.09875v3
- Date: Thu, 15 Aug 2024 21:46:42 GMT
- Title: Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
- Authors: Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III,
- Abstract summary: We propose a novel method to supervise 3D Gaussian Splatting scenes using optical tactile sensors.
We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone.
- Score: 13.895893586777802
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs
Related papers
- Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting [27.45827655042124]
We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS)
We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method.
We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty.
arXiv Detail & Related papers (2024-10-07T01:24:39Z) - GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization [1.4466437171584356]
3D Gaussian Splatting (3DGS) allows for the compact encoding of both 3D geometry and scene appearance with its spatial features.
We propose distilling dense keypoint descriptors into 3DGS to improve the model's spatial understanding.
Our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.
arXiv Detail & Related papers (2024-09-24T23:18:32Z) - PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting [59.277480452459315]
We propose a principled spatial sensitivity pruning score that outperforms current approaches.
We also propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model.
Our pipeline increases the average rendering speed of 3D-GS by 2.65$times$ while retaining more salient foreground information.
arXiv Detail & Related papers (2024-06-14T17:53:55Z) - Tactile-Augmented Radiance Fields [23.3063261842082]
We present a scene representation, which we call a tactile-augmented radiance field (TaRF)
This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene.
We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes.
arXiv Detail & Related papers (2024-05-07T17:59:50Z) - HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2 [1.1874952582465603]
We leverage the capabilities of the Microsoft HoloLens 2 for instant 3D Gaussian Splatting.
We present HoloGS, a novel workflow utilizing HoloLens sensor data, which bypasses the need for pre-processing steps.
We evaluate our approach on two self-captured scenes: An outdoor scene of a cultural heritage statue and an indoor scene of a fine-structured plant.
arXiv Detail & Related papers (2024-05-03T11:08:04Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem.
Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians.
Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z) - SCONE: Surface Coverage Optimization in Unknown Environments by
Volumetric Integration [23.95135709027516]
Next Best View computation (NBV) is a long-standing problem in robotics.
We show that we can maximize surface metrics by Monte Carlo integration over a volumetric representation.
It takes as input an arbitrarily large point cloud gathered by a depth sensor like Lidar systems as well as camera poses to predict NBV.
arXiv Detail & Related papers (2022-08-22T17:04:14Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.