Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
- URL: http://arxiv.org/abs/2507.15062v1
- Date: Sun, 20 Jul 2025 17:53:59 GMT
- Title: Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
- Authors: Xinyue Zhu, Binghao Huang, Yunzhu Li,
- Abstract summary: We present a portable, lightweight gripper with integrated tactile sensors.<n>We propose a cross-modal representation learning framework that integrates visual and tactile signals.<n>We validate our approach on fine-grained tasks such as test tube insertion and pipette-based fluid transfer.
- Score: 7.618517580705364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the critical role of tactile feedback in precise manipulation. We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse, real-world, and in-the-wild settings. Building on this hardware, we propose a cross-modal representation learning framework that integrates visual and tactile signals while preserving their distinct characteristics. The learning procedure allows the emergence of interpretable representations that consistently focus on contacting regions relevant for physical interactions. When used for downstream manipulation tasks, these representations enable more efficient and effective policy learning, supporting precise robotic manipulation based on multimodal feedback. We validate our approach on fine-grained tasks such as test tube insertion and pipette-based fluid transfer, demonstrating improved accuracy and robustness under external disturbances. Our project page is available at https://binghao-huang.github.io/touch_in_the_wild/ .
Related papers
- 3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing [18.189782619503074]
This paper introduces textbf3D-ViTac, a multi-modal sensing and learning system for robots.<n>Our system features tactile sensors equipped with dense sensing units, each covering an area of 3$mm2$.<n>We show that even low-cost robots can perform precise manipulations and significantly outperform vision-only policies.
arXiv Detail & Related papers (2024-10-31T16:22:53Z) - Learning Visuotactile Skills with Two Multifingered Hands [80.99370364907278]
We explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data.
Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data.
arXiv Detail & Related papers (2024-04-25T17:59:41Z) - Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing [15.970078821894758]
We introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation.
Robot Synesthesia is a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia.
arXiv Detail & Related papers (2023-12-04T12:35:43Z) - Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor [14.492202828369127]
We leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks.<n>We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL.<n>Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%.
arXiv Detail & Related papers (2023-11-02T14:02:42Z) - The Power of the Senses: Generalizable Manipulation from Vision and
Touch through Masked Multimodal Learning [60.91637862768949]
We propose Masked Multimodal Learning (M3L) to fuse visual and tactile information in a reinforcement learning setting.
M3L learns a policy and visual-tactile representations based on masked autoencoding.
We evaluate M3L on three simulated environments with both visual and tactile observations.
arXiv Detail & Related papers (2023-11-02T01:33:00Z) - Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks.
vision-based tactile sensors are being widely used for various robotic perception and control tasks.
We present a method for interactive perception using vision-based tactile sensors for a part mating task.
arXiv Detail & Related papers (2023-03-10T16:27:37Z) - Visual-Tactile Multimodality for Following Deformable Linear Objects
Using Reinforcement Learning [15.758583731036007]
We study the problem of using vision and tactile inputs together to complete the task of following deformable linear objects.
We create a Reinforcement Learning agent using different sensing modalities and investigate how its behaviour can be boosted.
Our experiments show that the use of both vision and tactile inputs, together with proprioception, allows the agent to complete the task in up to 92% of cases.
arXiv Detail & Related papers (2022-03-31T21:59:08Z) - Learning to Detect Slip with Barometric Tactile Sensors and a Temporal
Convolutional Neural Network [7.346580429118843]
We present a learning-based method to detect slip using barometric tactile sensors.
We train a temporal convolution neural network to detect slip, achieving high detection accuracies.
We argue that barometric tactile sensing technology, combined with data-driven learning, is suitable for many manipulation tasks such as slip compensation.
arXiv Detail & Related papers (2022-02-19T08:21:56Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Elastic Tactile Simulation Towards Tactile-Visual Perception [58.44106915440858]
We propose Elastic Interaction of Particles (EIP) for tactile simulation.
EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact.
We further propose a tactile-visual perception network that enables information fusion between tactile data and visual images.
arXiv Detail & Related papers (2021-08-11T03:49:59Z) - Under Pressure: Learning to Detect Slip with Barometric Tactile Sensors [7.35805050004643]
We present a learning-based method to detect slip using barometric tactile sensors.
We are able to achieve slip detection accuracies of greater than 91%.
We show that barometric tactile sensing technology, combined with data-driven learning, is potentially suitable for many complex manipulation tasks.
arXiv Detail & Related papers (2021-03-24T19:29:03Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.