Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation
- URL: http://arxiv.org/abs/2503.02881v3
- Date: Wed, 23 Apr 2025 10:45:38 GMT
- Title: Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation
- Authors: Han Xue, Jieji Ren, Wendi Chen, Gu Zhang, Yuan Fang, Guoying Gu, Huazhe Xu, Cewu Lu,
- Abstract summary: Humans can accomplish contact-rich tasks using vision and touch, with highly reactive capabilities such as fast response to external changes and adaptive control of contact forces.<n>Existing visual imitation learning approaches rely on action chunking to model complex behaviors.<n>We introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality.
- Score: 58.95799126311524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can accomplish complex contact-rich tasks using vision and touch, with highly reactive capabilities such as fast response to external changes and adaptive control of contact forces; however, this remains challenging for robots. Existing visual imitation learning (IL) approaches rely on action chunking to model complex behaviors, which lacks the ability to respond instantly to real-time tactile feedback during the chunk execution. Furthermore, most teleoperation systems struggle to provide fine-grained tactile / force feedback, which limits the range of tasks that can be performed. To address these challenges, we introduce TactAR, a low-cost teleoperation system that provides real-time tactile feedback through Augmented Reality (AR), along with Reactive Diffusion Policy (RDP), a novel slow-fast visual-tactile imitation learning algorithm for learning contact-rich manipulation skills. RDP employs a two-level hierarchy: (1) a slow latent diffusion policy for predicting high-level action chunks in latent space at low frequency, (2) a fast asymmetric tokenizer for closed-loop tactile feedback control at high frequency. This design enables both complex trajectory modeling and quick reactive behavior within a unified framework. Through extensive evaluation across three challenging contact-rich tasks, RDP significantly improves performance compared to state-of-the-art visual IL baselines. Furthermore, experiments show that RDP is applicable across different tactile / force sensors. Code and videos are available on https://reactive-diffusion-policy.github.io.
Related papers
- PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies [4.6090500060386805]
PolyTouch is a novel robot finger that integrates camera-based tactile sensing, acoustic sensing, and peripheral visual sensing into a single design.
Experiments demonstrate a 20-fold increase in lifespan over commercial tactile sensors, with a design that is both easy to manufacture and scalable.
arXiv Detail & Related papers (2025-04-27T19:50:31Z) - Learning Precise, Contact-Rich Manipulation through Uncalibrated Tactile Skins [17.412763585521688]
We present the Visuo-Skin (ViSk) framework, a simple approach that uses a transformer-based policy and treats skin sensor data as additional tokens alongside visual information.
ViSk significantly outperforms both vision-only and optical tactile sensing based policies.
Further analysis reveals that combining tactile and visual modalities enhances policy performance and spatial generalization, achieving an average improvement of 27.5% across tasks.
arXiv Detail & Related papers (2024-10-22T17:59:49Z) - Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor [14.492202828369127]
We leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact-rich tasks.<n>We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL.<n>Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%.
arXiv Detail & Related papers (2023-11-02T14:02:42Z) - REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous
Manipulation [61.7171775202833]
We introduce an efficient system for learning dexterous manipulation skills withReinforcement learning.
The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping.
Our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy.
arXiv Detail & Related papers (2023-09-06T19:05:31Z) - Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance
Action Space [7.116986445066885]
Reinforcement Learning has led to promising results on a range of challenging decision-making tasks.
Fast human-like adaptive control methods can optimize complex robotic interactions, yet fail to integrate multimodal feedback needed for unstructured tasks.
We propose to factor the learning problem in a hierarchical learning and adaption architecture to get the best of both worlds.
arXiv Detail & Related papers (2021-10-19T12:09:02Z) - COCOI: Contact-aware Online Context Inference for Generalizable
Non-planar Pushing [87.7257446869134]
General contact-rich manipulation problems are long-standing challenges in robotics.
Deep reinforcement learning has shown great potential in solving robot manipulation tasks.
We propose COCOI, a deep RL method that encodes a context embedding of dynamics properties online.
arXiv Detail & Related papers (2020-11-23T08:20:21Z) - Deep Reinforcement Learning for Contact-Rich Skills Using Compliant
Movement Primitives [0.0]
Further integration of industrial robots is hampered by their limited flexibility, adaptability and decision making skills.
We propose different pruning methods that facilitate convergence and generalization.
We demonstrate that the proposed method can learn insertion skills that are invariant to space, size, shape, and closely related scenarios.
arXiv Detail & Related papers (2020-08-30T17:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.