3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing
- URL: http://arxiv.org/abs/2410.24091v1
- Date: Thu, 31 Oct 2024 16:22:53 GMT
- Title: 3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing
- Authors: Binghao Huang, Yixuan Wang, Xinyi Yang, Yiyue Luo, Yunzhu Li,
- Abstract summary: This paper introduces textbf3D-ViTac, a multi-modal sensing and learning system for robots.
Our system features tactile sensors equipped with dense sensing units, each covering an area of 3$mm2$.
We show that even low-cost robots can perform precise manipulations and significantly outperform vision-only policies.
- Score: 18.189782619503074
- License:
- Abstract: Tactile and visual perception are both crucial for humans to perform fine-grained interactions with their environment. Developing similar multi-modal sensing capabilities for robots can significantly enhance and expand their manipulation skills. This paper introduces \textbf{3D-ViTac}, a multi-modal sensing and learning system designed for dexterous bimanual manipulation. Our system features tactile sensors equipped with dense sensing units, each covering an area of 3$mm^2$. These sensors are low-cost and flexible, providing detailed and extensive coverage of physical contacts, effectively complementing visual information. To integrate tactile and visual data, we fuse them into a unified 3D representation space that preserves their 3D structures and spatial relationships. The multi-modal representation can then be coupled with diffusion policies for imitation learning. Through concrete hardware experiments, we demonstrate that even low-cost robots can perform precise manipulations and significantly outperform vision-only policies, particularly in safe interactions with fragile items and executing long-horizon tasks involving in-hand manipulation. Our project page is available at \url{https://binghao-huang.github.io/3D-ViTac/}.
Related papers
- Learning Precise, Contact-Rich Manipulation through Uncalibrated Tactile Skins [17.412763585521688]
We present the Visuo-Skin (ViSk) framework, a simple approach that uses a transformer-based policy and treats skin sensor data as additional tokens alongside visual information.
ViSk significantly outperforms both vision-only and optical tactile sensing based policies.
Further analysis reveals that combining tactile and visual modalities enhances policy performance and spatial generalization, achieving an average improvement of 27.5% across tasks.
arXiv Detail & Related papers (2024-10-22T17:59:49Z) - DexTouch: Learning to Seek and Manipulate Objects with Tactile Dexterity [11.450027373581019]
We introduce a multi-finger robot system designed to manipulate objects using the sense of touch, without relying on vision.
For tasks that mimic daily life, the robot uses its sense of touch to manipulate randomly placed objects in dark.
arXiv Detail & Related papers (2024-01-23T05:37:32Z) - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model.
We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data.
We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing [15.970078821894758]
We introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation.
Robot Synesthesia is a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia.
arXiv Detail & Related papers (2023-12-04T12:35:43Z) - The Power of the Senses: Generalizable Manipulation from Vision and
Touch through Masked Multimodal Learning [60.91637862768949]
We propose Masked Multimodal Learning (M3L) to fuse visual and tactile information in a reinforcement learning setting.
M3L learns a policy and visual-tactile representations based on masked autoencoding.
We evaluate M3L on three simulated environments with both visual and tactile observations.
arXiv Detail & Related papers (2023-11-02T01:33:00Z) - See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation [49.925499720323806]
We study how visual, auditory, and tactile perception can jointly help robots to solve complex manipulation tasks.
We build a robot system that can see with a camera, hear with a contact microphone, and feel with a vision-based tactile sensor.
arXiv Detail & Related papers (2022-12-07T18:55:53Z) - Touch and Go: Learning from Human-Collected Vision and Touch [16.139106833276]
We propose a dataset with paired visual and tactile data called Touch and Go.
Human data collectors probe objects in natural environments using tactile sensors.
Our dataset spans a large number of "in the wild" objects and scenes.
arXiv Detail & Related papers (2022-11-22T18:59:32Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.