CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
- URL: http://arxiv.org/abs/2509.24526v1
- Date: Mon, 29 Sep 2025 09:42:08 GMT
- Title: CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
- Authors: Zheyuan Hu, Chieh-Hsin Lai, Yuki Mitsufuji, Stefano Ermon,
- Abstract summary: Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models.<n>We introduce mid-training, the first concept and practical method that inserts a lightweight intermediate stage between the (diffusion) pre-training and the final flow map training.
- Score: 75.81132530657682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models, yet training remains unstable, sensitive to hyperparameters, and costly. Initializing from a pre-trained diffusion model helps, but still requires converting infinitesimal steps into a long-jump map, leaving instability unresolved. We introduce mid-training, the first concept and practical method that inserts a lightweight intermediate stage between the (diffusion) pre-training and the final flow map training (i.e., post-training) for vision generation. Concretely, Consistency Mid-Training (CMT) is a compact and principled stage that trains a model to map points along a solver trajectory from a pre-trained model, starting from a prior sample, directly to the solver-generated clean sample. It yields a trajectory-consistent and stable initialization. This initializer outperforms random and diffusion-based baselines and enables fast, robust convergence without heuristics. Initializing post-training with CMT weights further simplifies flow map learning. Empirically, CMT achieves state of the art two step FIDs: 1.97 on CIFAR-10, 1.32 on ImageNet 64x64, and 1.84 on ImageNet 512x512, while using up to 98% less training data and GPU time, compared to CMs. On ImageNet 256x256, CMT reaches 1-step FID 3.34 while cutting total training time by about 50% compared to MF from scratch (FID 3.43). This establishes CMT as a principled, efficient, and general framework for training flow map models.
Related papers
- Diffusion As Self-Distillation: End-to-End Latent Diffusion In One Model [53.77953728335891]
Latent Diffusion Models rely on a complex, three-part architecture consisting of a separate encoder, decoder, and diffusion network.<n>We propose Diffusion as Self-Distillation (DSD), a new framework with key modifications to the training objective that stabilize the latent space.<n>This approach enables, for the first time, the stable end-to-end training of a single network that simultaneously learns to encode, decode, and perform diffusion.
arXiv Detail & Related papers (2025-11-18T17:58:16Z) - MeanFlow Transformers with Representation Autoencoders [71.45823902973349]
MeanFlow (MF) is a diffusion-motivated generative model that enables efficient few-step generation by learning long jumps directly from noise to data.<n>We develop an efficient training and sampling scheme for MF in the latent space of a Representation Autoencoder (RAE)<n>We achieve a 1-step FID of 2.03, outperforming vanilla MF's 3.43, while reducing sampling GFLOPS by 38% and total training cost by 83% on ImageNet 256.
arXiv Detail & Related papers (2025-11-17T06:17:08Z) - Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling [68.76215229126886]
We introduce Decoupled MeanFlow, a simple decoding strategy that converts flow models into flow map models without architectural modifications.<n>Our method conditions the final blocks of diffusion transformers on the subsequent timestep, allowing pretrained flow models to be directly repurposed as flow maps.<n>On ImageNet 256x256 and 512x512, our models attain 1-step FID of 2.16 and 2.12, respectively, surpassing prior art by a large margin.
arXiv Detail & Related papers (2025-10-28T14:43:48Z) - Flow-Anchored Consistency Models [32.04797599813587]
Continuous-time Consistency Models (CMs) promise efficient few-step generation but face challenges with training instability.<n>We argue this instability stems from a fundamental conflict: by training a network to learn only a shortcut across a probability flow, the model loses its grasp on the instantaneous velocity field that defines the flow.<n>We introduce the Flow-Anchored Consistency Model (FACM), a simple but effective training strategy that uses a Flow Matching task as an anchor for the primary CM shortcut objective.
arXiv Detail & Related papers (2025-07-04T17:56:51Z) - Mean Flows for One-step Generative Modeling [64.4997821467102]
We propose a principled and effective framework for one-step generative modeling.<n>A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training.<n>Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning.
arXiv Detail & Related papers (2025-05-19T17:59:42Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.<n>We empirically find that this training paradigm limits the one-step generation performance of consistency models.<n>We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models [7.254959022456085]
Consistency models (CMs) are a powerful class of diffusion-based generative models for fast sampling.<n>Most existing CMs are trained using discretized timesteps, which introduce additional hyper parameters and are prone to discretization errors.<n>We propose a simplified theoretical framework that unifies previous parameterizations of diffusion models and CMs, identifying the root causes of instability.<n>Our proposed training algorithm achieves FID scores of 2.06 on CIFAR-10, 1.48 on ImageNet 64x64, and 1.88 on ImageNet 512x512, narrowing the gap in FID scores with the
arXiv Detail & Related papers (2024-10-14T20:43:25Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.