HybridFlow: A Two-Step Generative Policy for Robotic Manipulation
- URL: http://arxiv.org/abs/2602.13718v1
- Date: Sat, 14 Feb 2026 10:50:23 GMT
- Title: HybridFlow: A Two-Step Generative Policy for Robotic Manipulation
- Authors: Zhenchen Dong, Jinna Fu, Jiaming Wu, Shengyuan Yu, Fulin Chen, Yide Liu,
- Abstract summary: MeanFlow, as a one-step variant of flow matching, has shown strong potential in image generation.<n>HybridFlow balances inference speed and generation quality by leveraging the rapid advantage of MeanFlow one-step generation.<n>We envision HybridFlow as a practical low-latency method to enhance real-world interaction capabilities of robotic manipulation policies.
- Score: 2.2200541495683996
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Limited by inference latency, existing robot manipulation policies lack sufficient real-time interaction capability with the environment. Although faster generation methods such as flow matching are gradually replacing diffusion methods, researchers are pursuing even faster generation suitable for interactive robot control. MeanFlow, as a one-step variant of flow matching, has shown strong potential in image generation, but its precision in action generation does not meet the stringent requirements of robotic manipulation. We therefore propose \textbf{HybridFlow}, a \textbf{3-stage method} with \textbf{2-NFE}: Global Jump in MeanFlow mode, ReNoise for distribution alignment, and Local Refine in ReFlow mode. This method balances inference speed and generation quality by leveraging the rapid advantage of MeanFlow one-step generation while ensuring action precision with minimal generation steps. Through real-world experiments, HybridFlow outperforms the 16-step Diffusion Policy by \textbf{15--25\%} in success rate while reducing inference time from 152ms to 19ms (\textbf{8$\times$ speedup}, \textbf{$\sim$52Hz}); it also achieves 70.0\% success on unseen-color OOD grasping and 66.3\% on deformable object folding. We envision HybridFlow as a practical low-latency method to enhance real-world interaction capabilities of robotic manipulation policies.
Related papers
- FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference [10.34801095627052]
Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower.<n>We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models.<n> Experiments demonstrate a speedup of over 2.6x while maintaining high-quality outputs.
arXiv Detail & Related papers (2026-02-11T18:21:11Z) - RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation [12.979642182577157]
Mean flow (MeanFlow) enables efficient, high-fidelity image generation, yet its single-function evaluation (1-NFE) generation often cannot yield compelling results.<n>We introduce RMFlow, an efficient multimodal generative model that integrates a coarse 1-NFE MeanFlow transport with a tailored noise-injection refinement step.<n> RMFlow achieves near state-of-the-art results on text-to-image, context-to-molecule, and time-series generation using only 1-NFE, at a computational cost comparable to the baseline MeanFlows.
arXiv Detail & Related papers (2026-01-31T18:27:05Z) - ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge [11.016302257907936]
Vision-Language-Action (VLA) models have emerged as a unified paradigm for robotic perception and control.<n>Current VLA models operate at only 3-5 Hz on edge devices due to the memory bound nature of autoregressive decoding.<n>We introduce ActionFlow, a system-level inference framework tailored for resource-constrained edge plat forms.
arXiv Detail & Related papers (2025-12-23T11:29:03Z) - ARMFlow: AutoRegressive MeanFlow for Online 3D Human Reaction Generation [48.716675019745885]
3D human reaction generation faces three main challenges: high motion fidelity, real-time inference, and autoregressive adaptability for online scenarios.<n>We propose ARMFlow, a MeanFlow-based autoregressive framework that models temporal dependencies between motions and velocity.<n>Our single-step online generation surpasses existing methods on InterHuman and InterX by over 40% in FID, while matching offline state-of-the-art performance despite using only partial sequence conditions.
arXiv Detail & Related papers (2025-12-18T06:28:42Z) - Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories [14.36205662558203]
Rectified MeanFlow is a framework that models the mean velocity field along the rectified trajectory using only a single reflow step.<n>Experiments on ImageNet at 64, 256, and 512 resolutions show that Re-MeanFlow consistently outperforms prior one-step flow distillation and Rectified Flow methods in both sample quality and training efficiency.
arXiv Detail & Related papers (2025-11-28T16:50:08Z) - Understanding, Accelerating, and Improving MeanFlow Training [64.84964628592418]
MeanFlow promises high-quality generative modeling in few steps, by jointly learning instantaneous and average velocity fields.<n>We analyze the interaction between the two velocities and find: (i) well-established instantaneous velocity is a prerequisite for learning average velocity.<n>We design an effective training scheme that accelerates the formation of instantaneous velocity, then shifts emphasis from short to long-interval average velocity.
arXiv Detail & Related papers (2025-11-24T12:59:27Z) - One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow [56.13949180229929]
We introduce a one-step generative policy for offline reinforcement learning that maps noise directly to actions via a residual reformulation of MeanFlow.<n>Our method achieves strong performance in both offline and offline-to-online reinforcement learning settings.
arXiv Detail & Related papers (2025-11-17T06:34:17Z) - DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation [23.382067451764396]
Flow-based generative models have emerged as a promising solution to learning distributions of actions.<n>Existing flow-based policies suffer from representation collapse, the inability to distinguish similar visual representations, leading to failures in precise manipulation tasks.<n>We propose DM1, a novel flow matching framework that integrates dispersive regularization into MeanFlow to prevent collapse while maintaining one-step efficiency.
arXiv Detail & Related papers (2025-10-09T07:12:20Z) - OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows [59.052955667723985]
We present OneFlow, the first non-autoregressive multimodal model that enables variable-length and concurrent mixed-modal generation.<n>Unlike autoregressive models that enforce rigid causal ordering between text and image generation, OneFlow combines an insertion-based Edit Flow for discrete text tokens with Flow Matching for image latents.
arXiv Detail & Related papers (2025-10-03T20:40:30Z) - MeanFlowSE: one-step generative speech enhancement via conditional mean flow [13.437825847370442]
MeanFlowSE is a conditional generative model that learns the average velocity over finite intervals along a trajectory.<n>On VoiceBank-DEMAND, the single-step model achieves strong intelligibility, fidelity, and perceptual quality with substantially lower computational cost than multistep baselines.
arXiv Detail & Related papers (2025-09-18T11:24:47Z) - FlowTS: Time Series Generation via Rectified Flow [67.41208519939626]
FlowTS is an ODE-based model that leverages rectified flow with straight-line transport in probability space.<n>For unconditional setting, FlowTS achieves state-of-the-art performance, with context FID scores of 0.019 and 0.011 on Stock and ETTh datasets.<n>For conditional setting, we have achieved superior performance in solar forecasting.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator [73.80050807279461]
Piecewise Rectified Flow (PeRFlow) is a flow-based method for accelerating diffusion models.
PeRFlow achieves superior performance in a few-step generation.
arXiv Detail & Related papers (2024-05-13T07:10:53Z) - Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling [49.215957313126324]
Recently developed generative methods, including invertible rescaling network (IRN) based and generative adversarial network (GAN) based methods, have demonstrated exceptional performance in image rescaling.
However, IRN-based methods tend to produce over-smoothed results, while GAN-based methods easily generate fake details.
We propose Boundary-aware Decoupled Flow Networks (BDFlow) to generate realistic and visually pleasing results.
arXiv Detail & Related papers (2024-05-05T14:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.