FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency
- URL: http://arxiv.org/abs/2506.08822v1
- Date: Tue, 10 Jun 2025 14:12:53 GMT
- Title: FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency
- Authors: Yifei Su, Ning Liu, Dong Chen, Zhen Zhao, Kun Wu, Meng Li, Zhiyuan Xu, Zhengping Che, Jian Tang,
- Abstract summary: We propose FreqPolicy to exploit temporal information in robotic manipulation.<n>FreqPolicy first imposes frequency consistency constraints on flow-based visuomotor policies.<n>We show efficiency and effectiveness in real-world robotic scenarios with an inference frequency 93.5Hz.
- Score: 34.81668269819768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative modeling-based visuomotor policies have been widely adopted in robotic manipulation attributed to their ability to model multimodal action distributions. However, the high inference cost of multi-step sampling limits their applicability in real-time robotic systems. To address this issue, existing approaches accelerate the sampling process in generative modeling-based visuomotor policies by adapting acceleration techniques originally developed for image generation. Despite this progress, a major distinction remains: image generation typically involves producing independent samples without temporal dependencies, whereas robotic manipulation involves generating time-series action trajectories that require continuity and temporal coherence. To effectively exploit temporal information in robotic manipulation, we propose FreqPolicy, a novel approach that first imposes frequency consistency constraints on flow-based visuomotor policies. Our work enables the action model to capture temporal structure effectively while supporting efficient, high-quality one-step action generation. We introduce a frequency consistency constraint that enforces alignment of frequency-domain action features across different timesteps along the flow, thereby promoting convergence of one-step action generation toward the target distribution. In addition, we design an adaptive consistency loss to capture structural temporal variations inherent in robotic manipulation tasks. We assess FreqPolicy on 53 tasks across 3 simulation benchmarks, proving its superiority over existing one-step action generators. We further integrate FreqPolicy into the vision-language-action (VLA) model and achieve acceleration without performance degradation on the 40 tasks of Libero. Besides, we show efficiency and effectiveness in real-world robotic scenarios with an inference frequency 93.5Hz. The code will be publicly available.
Related papers
- SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration [69.54069477520534]
Vision-Language-Action (VLA) models have attracted increasing attention for their strong control capabilities.<n>Their high computational cost and low execution frequency hinder their suitability for real-time tasks such as robotic manipulation and autonomous navigation.<n>We propose SP-VLA, a unified framework that accelerates VLA models by jointly scheduling models and pruning tokens.
arXiv Detail & Related papers (2025-06-15T05:04:17Z) - FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens [20.715024408481973]
We propose a novel paradigm for visuomotor policy learning that progressively models hierarchical frequency components.<n>To further enhance precision, we introduce continuous latent representations that maintain smoothness and continuity in the action space.<n>Our approach outperforms existing methods in both accuracy and efficiency.
arXiv Detail & Related papers (2025-06-02T12:13:51Z) - Action Flow Matching for Continual Robot Learning [57.698553219660376]
Continual learning in robotics seeks systems that can constantly adapt to changing environments and tasks.<n>We introduce a generative framework leveraging flow matching for online robot dynamics model alignment.<n>We find that by transforming the actions themselves rather than exploring with a misaligned model, the robot collects informative data more efficiently.
arXiv Detail & Related papers (2025-04-25T16:26:15Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.<n>Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.<n>Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - FAST: Efficient Action Tokenization for Vision-Language-Action Models [98.15494168962563]
We propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform.<n>Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories.
arXiv Detail & Related papers (2025-01-16T18:57:04Z) - CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction [28.761494362934087]
Coarse-to-Fine AutoRegressive Policy (CARP) is a novel paradigm for visuomotor policy learning.<n>It redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach.<n>CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies.
arXiv Detail & Related papers (2024-12-09T18:59:18Z) - One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation [80.71541671907426]
OneStep Diffusion Policy (OneDP) is a novel approach that distills knowledge from pre-trained diffusion policies into a single-step action generator.
OneDP significantly accelerates response times for robotic control tasks.
arXiv Detail & Related papers (2024-10-28T17:54:31Z) - Diffusion Transformer Policy [48.50988753948537]
We propose a large multi-modal diffusion transformer, dubbed as Diffusion Transformer Policy, to model continuous end-effector actions.<n>By leveraging the scaling capability of transformers, the proposed approach can effectively model continuous end-effector actions across large diverse robot datasets.
arXiv Detail & Related papers (2024-10-21T12:43:54Z) - ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation [18.209973947319316]
Diffusion models have been verified to be effective in generating complex distributions from natural images to motion trajectories.<n>Recent methods show impressive performance in 3D robotic manipulation tasks, whereas they suffer from severe runtime inefficiency due to multiple denoising steps.<n>We propose a real-time robotic manipulation model named ManiCM that imposes the consistency constraint to the diffusion process.
arXiv Detail & Related papers (2024-06-03T17:59:23Z) - VAE-Loco: Versatile Quadruped Locomotion by Learning a Disentangled Gait
Representation [78.92147339883137]
We show that it is pivotal in increasing controller robustness by learning a latent space capturing the key stance phases constituting a particular gait.
We demonstrate that specific properties of the drive signal map directly to gait parameters such as cadence, footstep height and full stance duration.
The use of a generative model facilitates the detection and mitigation of disturbances to provide a versatile and robust planning framework.
arXiv Detail & Related papers (2022-05-02T19:49:53Z) - Next Steps: Learning a Disentangled Gait Representation for Versatile
Quadruped Locomotion [69.87112582900363]
Current planners are unable to vary key gait parameters continuously while the robot is in motion.
In this work we address this limitation by learning a latent space capturing the key stance phases constituting a particular gait.
We demonstrate that specific properties of the drive signal map directly to gait parameters such as cadence, foot step height and full stance duration.
arXiv Detail & Related papers (2021-12-09T10:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.