Related papers: Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation

Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation

URL: http://arxiv.org/abs/2602.13689v1
Date: Sat, 14 Feb 2026 09:19:48 GMT
Title: Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation
Authors: Wonju Lee, Matteo Grimaldi, Tao Yu,
Abstract summary: We propose a Cross-Modal Transformer (CMT) for visuo-tactile fusion.<n>CMT integrates wrist-camera observations with tactile signals through structured self- and cross-attention.<n>Experiments on the TacSL benchmark show that CMT with symmetry regularization achieves a 96.59% insertion success rate.
Score: 7.104060092661104
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Insertion tasks in robotic manipulation demand precise, contact-rich interactions that vision alone cannot resolve. While tactile feedback is intuitively valuable, existing studies have shown that naïve visuo-tactile fusion often fails to deliver consistent improvements. In this work, we propose a Cross-Modal Transformer (CMT) for visuo-tactile fusion that integrates wrist-camera observations with tactile signals through structured self- and cross-attention. To stabilize tactile embeddings, we further introduce a physics-informed regularization that encourages bilateral force balance, reflecting principles of human motor control. Experiments on the TacSL benchmark show that CMT with symmetry regularization achieves a 96.59% insertion success rate, surpassing naïve and gated fusion baselines and closely matching the privileged "wrist + contact force" configuration (96.09%). These results highlight two central insights: (i) tactile sensing is indispensable for precise alignment, and (ii) principled multimodal fusion, further strengthened by physics-informed regularization, unlocks complementary strengths of vision and touch, approaching privileged performance under realistic sensing.

Related papers

FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation [8.726448573057725]
We present Force-Distilled VLA, a novel framework that integrates force awareness into contact-rich manipulation.<n>The core of our approach is a Force Distillation Module (FDM), which distills force by mapping a learnable query token.<n>During inference, this distilled force token is injected into the pretrained VLM, enabling force-aware reasoning.
arXiv Detail & Related papers (2026-02-02T14:19:46Z)
Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation [12.509181374985936]
Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities.<n>But training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics.<n>We present a practical framework that utilizes dense tactile feedback combined with joint torque sensing to regulate physical interactions.
arXiv Detail & Related papers (2026-01-06T07:26:39Z)
OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction [93.88239833545623]
We present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset.<n>We show that tactile signals provide a compact yet powerful cue for grasp understanding.<n>We aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.
arXiv Detail & Related papers (2025-12-18T18:18:17Z)
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation [21.78866976181311]
See-through-skin (STS) sensors combine tactile and visual perception.<n>Existing STS designs lack simultaneous multimodal perception and suffer from unreliable tactile tracking.<n>We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction.
arXiv Detail & Related papers (2025-12-10T17:35:13Z)
Enhancing Tactile-based Reinforcement Learning for Robotic Control [32.565866574593635]
We develop self-supervised learning (SSL) methodologies to more effectively harness tactile observations.<n>We empirically demonstrate that sparse binary tactile signals are critical for dexterity.<n>We release the Robot Tactile Olympiad (RoTO) benchmark to standardise and promote future research in tactile-based manipulation.
arXiv Detail & Related papers (2025-10-24T16:15:05Z)
TranTac: Leveraging Transient Tactile Signals for Contact-Rich Robotic Manipulation [11.834021644402148]
Robotic manipulation tasks such as inserting a key into a lock or plugging a USB device into a port can fail when visual perception is insufficient to detect misalignment.<n>Here, we introduce TranTac, a data-efficient and low-cost tactile sensing and control framework.<n>Our customized sensing system can detect dynamic translational and torsional deformations at the micrometer scale.
arXiv Detail & Related papers (2025-09-20T06:25:59Z)
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation [62.58034332427291]
ForceVLA is a novel end-to-end manipulation framework.<n>It treats external force sensing as a first-class modality within VLA systems.
arXiv Detail & Related papers (2025-05-28T09:24:25Z)
Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor [14.492202828369127]
We leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks.<n>We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL.<n>Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%.
arXiv Detail & Related papers (2023-11-02T14:02:42Z)
Elastic Tactile Simulation Towards Tactile-Visual Perception [58.44106915440858]
We propose Elastic Interaction of Particles (EIP) for tactile simulation. EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact. We further propose a tactile-visual perception network that enables information fusion between tactile data and visual images.
arXiv Detail & Related papers (2021-08-11T03:49:59Z)
Learning Compliance Adaptation in Contact-Rich Manipulation [81.40695846555955]
We propose a novel approach for learning predictive models of force profiles required for contact-rich tasks. The approach combines an anomaly detection based on Bidirectional Gated Recurrent Units (Bi-GRU) and an adaptive force/impedance controller.
arXiv Detail & Related papers (2020-05-01T05:23:34Z)
OmniTact: A Multi-Directional High Resolution Touch Sensor [109.28703530853542]
Existing tactile sensors are either flat, have small sensitive fields or only provide low-resolution signals. We introduce OmniTact, a multi-directional high-resolution tactile sensor. We evaluate the capabilities of OmniTact on a challenging robotic control task.
arXiv Detail & Related papers (2020-03-16T01:31:29Z)
The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? [57.366931129764815]
We collect more than 9,000 grasping trials using a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger.<n>Our experimental results indicate that incorporating tactile readings substantially improve grasping performance.
arXiv Detail & Related papers (2017-10-16T05:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.