Tactile-Augmented Radiance Fields
- URL: http://arxiv.org/abs/2405.04534v1
- Date: Tue, 7 May 2024 17:59:50 GMT
- Title: Tactile-Augmented Radiance Fields
- Authors: Yiming Dou, Fengyu Yang, Yi Liu, Antonio Loquercio, Andrew Owens,
- Abstract summary: We present a scene representation, which we call a tactile-augmented radiance field (TaRF)
This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene.
We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes.
- Score: 23.3063261842082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a scene representation, which we call a tactile-augmented radiance field (TaRF), that brings vision and touch into a shared 3D space. This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene. We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes. Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features. We use these insights to register touch signals to a captured visual scene, and to train a conditional diffusion model that, provided with an RGB-D image rendered from a neural radiance field, generates its corresponding tactile signal. To evaluate our approach, we collect a dataset of TaRFs. This dataset contains more touch samples than previous real-world datasets, and it provides spatially aligned visual signals for each captured touch signal. We demonstrate the accuracy of our cross-modal generative model and the utility of the captured visual-tactile data on several downstream tasks. Project page: https://dou-yiming.github.io/TaRF
Related papers
- Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting [13.895893586777802]
We propose a novel method to supervise 3D Gaussian Splatting scenes using optical tactile sensors.
We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone.
arXiv Detail & Related papers (2024-03-14T21:09:59Z) - TouchSDF: A DeepSDF Approach for 3D Shape Reconstruction using
Vision-Based Tactile Sensing [29.691786688595762]
Humans rely on their visual and tactile senses to develop a comprehensive 3D understanding of their physical environment.
We propose TouchSDF, a Deep Learning approach for tactile 3D shape reconstruction.
Our technique consists of two components: (1) a Convolutional Neural Network that maps tactile images into local meshes representing the surface at the touch location, and (2) an implicit neural function that predicts a signed distance function to extract the desired 3D shape.
arXiv Detail & Related papers (2023-11-21T13:43:06Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Implicit Ray-Transformers for Multi-view Remote Sensing Image
Segmentation [26.726658200149544]
We propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR) for RS scene semantic segmentation with sparse labels.
The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene.
In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations.
arXiv Detail & Related papers (2023-03-15T07:05:07Z) - One-Shot Neural Fields for 3D Object Understanding [112.32255680399399]
We present a unified and compact scene representation for robotics.
Each object in the scene is depicted by a latent code capturing geometry and appearance.
This representation can be decoded for various tasks such as novel view rendering, 3D reconstruction, and stable grasp prediction.
arXiv Detail & Related papers (2022-10-21T17:33:14Z) - Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D
Representations [29.756718435405983]
Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis.
Existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views.
We introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation.
arXiv Detail & Related papers (2022-10-20T11:13:50Z) - PeRFception: Perception using Radiance Fields [72.99583614735545]
We create the first large-scale implicit representation datasets for perception tasks, called the PeRFception.
It shows a significant memory compression rate (96.4%) from the original dataset, while containing both 2D and 3D information in a unified form.
We construct the classification and segmentation models that directly take as input this implicit format and also propose a novel augmentation technique to avoid overfitting on backgrounds of images.
arXiv Detail & Related papers (2022-08-24T13:32:46Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Supervising Remote Sensing Change Detection Models with 3D Surface
Semantics [1.8782750537161614]
We propose Contrastive Surface-Image Pretraining (CSIP) for joint learning using optical RGB and above ground level (AGL) map pairs.
We then evaluate these pretrained models on several building segmentation and change detection datasets to show that our method does, in fact, extract features relevant to downstream applications.
arXiv Detail & Related papers (2022-02-26T23:35:43Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.