Related papers: TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

URL: http://arxiv.org/abs/2512.05150v1
Date: Wed, 03 Dec 2025 07:45:46 GMT
Title: TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Authors: Zhenglin Cheng, Peng Sun, Jianguo Li, Tao Lin,
Abstract summary: We propose TwinFlow, a framework for training 1-step generative models.<n>Our method achieves a GenEval score of 0.83 in 1-NFE on text-to-image tasks.<n>Our approach matches the performance of the original 100-NFE model on GenEval and DPG-Bench benchmarks.
Score: 25.487712175353035
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by $100\times$ with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.

Related papers

SoFlow: Solution Flow Models for One-Step Generative Modeling [10.054000663262618]
Flow Models (SoFlow) is a framework for one-step generation from scratch.<n>Flow Matching loss allows our models to provide estimated velocity fields during training.<n>Our models achieve better FID-50K scores than MeanFlow models on the ImageNet 256x256 dataset.
arXiv Detail & Related papers (2025-12-17T18:10:17Z)
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models [100.28111930893188]
Some of today's best generative models still require hundreds to thousands of neural function evaluations to compute a single likelihood.<n>We present fast flow joint distillation (F2D2), a framework that simultaneously reduces the number of NFEs required for both sampling and likelihood evaluation by two orders of magnitude.<n>F2D2 is modular, compatible with existing flow-based few-step sampling models, and requires only an additional divergence prediction head.
arXiv Detail & Related papers (2025-12-02T10:48:20Z)
Adversarial Flow Models [26.917627135225118]
We present adversarial flow models, a class of generative models that unifies adversarial models and flow models.<n>Our method supports native one-step or multi-step generation and is trained using the adversarial objective.<n>We show the possibility of end-to-end training of 56-layer and 112-layer models through depth repetition without any intermediate supervision.
arXiv Detail & Related papers (2025-11-27T14:04:08Z)
Flow-Anchored Consistency Models [32.04797599813587]
Continuous-time Consistency Models (CMs) promise efficient few-step generation but face challenges with training instability.<n>We argue this instability stems from a fundamental conflict: by training a network to learn only a shortcut across a probability flow, the model loses its grasp on the instantaneous velocity field that defines the flow.<n>We introduce the Flow-Anchored Consistency Model (FACM), a simple but effective training strategy that uses a Flow Matching task as an anchor for the primary CM shortcut objective.
arXiv Detail & Related papers (2025-07-04T17:56:51Z)
Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z)
Align Your Flow: Scaling Continuous-Time Flow Map Distillation [63.927438959502226]
Flow maps connect any two noise levels in a single step and remain effective across all step counts.<n>We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks.<n>We show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.
arXiv Detail & Related papers (2025-06-17T15:06:07Z)
Mean Flows for One-step Generative Modeling [64.4997821467102]
We propose a principled and effective framework for one-step generative modeling.<n>A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training.<n>Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning.
arXiv Detail & Related papers (2025-05-19T17:59:42Z)
Self-Corrected Flow Distillation for Consistent One-Step and Few-Step Text-to-Image Generation [3.8959351616076745]
Flow matching has emerged as a promising framework for training generative models.<n>We introduce a self-corrected flow distillation method that integrates consistency models and adversarial training.<n>This work is a pioneer in achieving consistent generation quality in both few-step and one-step sampling.
arXiv Detail & Related papers (2024-12-22T07:48:49Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.<n>We empirically find that this training paradigm limits the one-step generation performance of consistency models.<n>We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image. Our method enables fully offline training with just noise/image pairs from the diffusion model. We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.