Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation
- URL: http://arxiv.org/abs/2602.13689v1
- Date: Sat, 14 Feb 2026 09:19:48 GMT
- Title: Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation
- Authors: Wonju Lee, Matteo Grimaldi, Tao Yu,
- Abstract summary: We propose a Cross-Modal Transformer (CMT) for visuo-tactile fusion.<n>CMT integrates wrist-camera observations with tactile signals through structured self- and cross-attention.<n>Experiments on the TacSL benchmark show that CMT with symmetry regularization achieves a 96.59% insertion success rate.
- Score: 7.104060092661104
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Insertion tasks in robotic manipulation demand precise, contact-rich interactions that vision alone cannot resolve. While tactile feedback is intuitively valuable, existing studies have shown that naïve visuo-tactile fusion often fails to deliver consistent improvements. In this work, we propose a Cross-Modal Transformer (CMT) for visuo-tactile fusion that integrates wrist-camera observations with tactile signals through structured self- and cross-attention. To stabilize tactile embeddings, we further introduce a physics-informed regularization that encourages bilateral force balance, reflecting principles of human motor control. Experiments on the TacSL benchmark show that CMT with symmetry regularization achieves a 96.59% insertion success rate, surpassing naïve and gated fusion baselines and closely matching the privileged "wrist + contact force" configuration (96.09%). These results highlight two central insights: (i) tactile sensing is indispensable for precise alignment, and (ii) principled multimodal fusion, further strengthened by physics-informed regularization, unlocks complementary strengths of vision and touch, approaching privileged performance under realistic sensing.
Related papers
- FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation [8.726448573057725]
We present Force-Distilled VLA, a novel framework that integrates force awareness into contact-rich manipulation.<n>The core of our approach is a Force Distillation Module (FDM), which distills force by mapping a learnable query token.<n>During inference, this distilled force token is injected into the pretrained VLM, enabling force-aware reasoning.
arXiv Detail & Related papers (2026-02-02T14:19:46Z) - Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation [12.509181374985936]
Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities.<n>But training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics.<n>We present a practical framework that utilizes dense tactile feedback combined with joint torque sensing to regulate physical interactions.
arXiv Detail & Related papers (2026-01-06T07:26:39Z) - OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction [93.88239833545623]
We present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset.<n>We show that tactile signals provide a compact yet powerful cue for grasp understanding.<n>We aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.
arXiv Detail & Related papers (2025-12-18T18:18:17Z) - Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation [21.78866976181311]
See-through-skin (STS) sensors combine tactile and visual perception.<n>Existing STS designs lack simultaneous multimodal perception and suffer from unreliable tactile tracking.<n>We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction.
arXiv Detail & Related papers (2025-12-10T17:35:13Z) - Enhancing Tactile-based Reinforcement Learning for Robotic Control [32.565866574593635]
We develop self-supervised learning (SSL) methodologies to more effectively harness tactile observations.<n>We empirically demonstrate that sparse binary tactile signals are critical for dexterity.<n>We release the Robot Tactile Olympiad (RoTO) benchmark to standardise and promote future research in tactile-based manipulation.
arXiv Detail & Related papers (2025-10-24T16:15:05Z) - TranTac: Leveraging Transient Tactile Signals for Contact-Rich Robotic Manipulation [11.834021644402148]
Robotic manipulation tasks such as inserting a key into a lock or plugging a USB device into a port can fail when visual perception is insufficient to detect misalignment.<n>Here, we introduce TranTac, a data-efficient and low-cost tactile sensing and control framework.<n>Our customized sensing system can detect dynamic translational and torsional deformations at the micrometer scale.
arXiv Detail & Related papers (2025-09-20T06:25:59Z) - ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation [62.58034332427291]
ForceVLA is a novel end-to-end manipulation framework.<n>It treats external force sensing as a first-class modality within VLA systems.
arXiv Detail & Related papers (2025-05-28T09:24:25Z) - Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor [14.492202828369127]
We leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks.<n>We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL.<n>Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%.
arXiv Detail & Related papers (2023-11-02T14:02:42Z) - Elastic Tactile Simulation Towards Tactile-Visual Perception [58.44106915440858]
We propose Elastic Interaction of Particles (EIP) for tactile simulation.
EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact.
We further propose a tactile-visual perception network that enables information fusion between tactile data and visual images.
arXiv Detail & Related papers (2021-08-11T03:49:59Z) - Learning Compliance Adaptation in Contact-Rich Manipulation [81.40695846555955]
We propose a novel approach for learning predictive models of force profiles required for contact-rich tasks.
The approach combines an anomaly detection based on Bidirectional Gated Recurrent Units (Bi-GRU) and an adaptive force/impedance controller.
arXiv Detail & Related papers (2020-05-01T05:23:34Z) - OmniTact: A Multi-Directional High Resolution Touch Sensor [109.28703530853542]
Existing tactile sensors are either flat, have small sensitive fields or only provide low-resolution signals.
We introduce OmniTact, a multi-directional high-resolution tactile sensor.
We evaluate the capabilities of OmniTact on a challenging robotic control task.
arXiv Detail & Related papers (2020-03-16T01:31:29Z) - The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? [57.366931129764815]
We collect more than 9,000 grasping trials using a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger.<n>Our experimental results indicate that incorporating tactile readings substantially improve grasping performance.
arXiv Detail & Related papers (2017-10-16T05:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.