Related papers: Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor

URL: http://arxiv.org/abs/2311.01248v5
Date: Sun, 26 Jan 2025 15:03:06 GMT
Title: Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor
Authors: Trevor Ablett, Oliver Limoyo, Adam Sigal, Affan Jilani, Jonathan Kelly, Kaleem Siddiqi, Francois Hogan, Gregory Dudek,
Abstract summary: We leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks.<n>We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL.<n>Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%.
Score: 14.492202828369127
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contact-rich tasks continue to present many challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks that involve relative motion (e.g., slipping and sliding) between the end-effector and the manipulated object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door-opening tasks with a variety of observation and algorithm configurations to study the utility of multimodal visuotactile sensing and our proposed improvements. Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to enable accurate task feedback. Project site: https://papers.starslab.ca/sts-il/

Related papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper [7.618517580705364]
We present a portable, lightweight gripper with integrated tactile sensors.<n>We propose a cross-modal representation learning framework that integrates visual and tactile signals.<n>We validate our approach on fine-grained tasks such as test tube insertion and pipette-based fluid transfer.
arXiv Detail & Related papers (2025-07-20T17:53:59Z)
Fast Bilateral Teleoperation and Imitation Learning Using Sensorless Force Control via Accurate Dynamics Model [1.6019538204169677]
This work demonstrates that fast teleoperation with force feedback is feasible even with force-sensorless, low-cost manipulators.<n>Based on accurately identified manipulator dynamics, our method integrates nonlinear terms compensation, velocity and external force estimation.<n>Using data collected by 4-channel bilateral control, we show that incorporating force information into both the input and output of learned policies improves performance in imitation learning.
arXiv Detail & Related papers (2025-07-08T16:54:34Z)
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation [54.28635581240747]
Vision-Language-Action (VLA) models have advanced general-purpose robotic manipulation by leveraging pretrained visual and linguistic representations.<n>ForceVLA treats external force sensing as a first-class modality within VLA systems.<n>Our approach highlights the importance of multimodal integration for dexterous manipulation and sets a new benchmark for physically intelligent robotic control.
arXiv Detail & Related papers (2025-05-28T09:24:25Z)
Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation [58.95799126311524]
Humans can accomplish contact-rich tasks using vision and touch, with highly reactive capabilities such as fast response to external changes and adaptive control of contact forces. Existing visual imitation learning approaches rely on action chunking to model complex behaviors. We introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality.
arXiv Detail & Related papers (2025-03-04T18:58:21Z)
Learning Precise, Contact-Rich Manipulation through Uncalibrated Tactile Skins [17.412763585521688]
We present the Visuo-Skin (ViSk) framework, a simple approach that uses a transformer-based policy and treats skin sensor data as additional tokens alongside visual information. ViSk significantly outperforms both vision-only and optical tactile sensing based policies. Further analysis reveals that combining tactile and visual modalities enhances policy performance and spatial generalization, achieving an average improvement of 27.5% across tasks.
arXiv Detail & Related papers (2024-10-22T17:59:49Z)
Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z)
Learning Visuotactile Skills with Two Multifingered Hands [80.99370364907278]
We explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data.
arXiv Detail & Related papers (2024-04-25T17:59:41Z)
Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-Training [0.850206009406913]
MViTac is a novel methodology that leverages contrastive learning to integrate vision and touch sensations in a self-supervised fashion. By availing both sensory inputs, MViTac leverages intra and inter-modality losses for learning representations, resulting in enhanced material property classification and more adept grasping prediction.
arXiv Detail & Related papers (2024-01-22T15:11:57Z)
The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning [60.91637862768949]
We propose Masked Multimodal Learning (M3L) to fuse visual and tactile information in a reinforcement learning setting. M3L learns a policy and visual-tactile representations based on masked autoencoding. We evaluate M3L on three simulated environments with both visual and tactile observations.
arXiv Detail & Related papers (2023-11-02T01:33:00Z)
MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation [8.738889129462013]
"MimicTouch" is a novel framework for learning policies directly from demonstrations provided by human users with their hands. The key innovations are i) a human tactile data collection system which collects multi-modal tactile dataset for learning human's tactile-guided control strategy, and ii) an imitation learning-based framework for learning human's tactile-guided control strategy through such data.
arXiv Detail & Related papers (2023-10-25T18:34:06Z)
Tactile-Filter: Interactive Tactile Perception for Part Mating [54.46221808805662]
Humans rely on touch and tactile sensing for a lot of dexterous manipulation tasks. vision-based tactile sensors are being widely used for various robotic perception and control tasks. We present a method for interactive perception using vision-based tactile sensors for a part mating task.
arXiv Detail & Related papers (2023-03-10T16:27:37Z)
Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition. Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss. We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z)
Visual-Tactile Multimodality for Following Deformable Linear Objects Using Reinforcement Learning [15.758583731036007]
We study the problem of using vision and tactile inputs together to complete the task of following deformable linear objects. We create a Reinforcement Learning agent using different sensing modalities and investigate how its behaviour can be boosted. Our experiments show that the use of both vision and tactile inputs, together with proprioception, allows the agent to complete the task in up to 92% of cases.
arXiv Detail & Related papers (2022-03-31T21:59:08Z)
TANDEM: Learning Joint Exploration and Decision Making with Tactile Sensors [15.418884994244996]
We focus on the process of guiding tactile exploration, and its interplay with task-related decision making. We propose TANDEM, an architecture to learn efficient exploration strategies in conjunction with decision making. We demonstrate this method on a tactile object recognition task, where a robot equipped with a touch sensor must explore and identify an object from a known set based on tactile feedback alone.
arXiv Detail & Related papers (2022-03-01T23:55:09Z)
Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots. We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector. We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.