One-step Latent-free Image Generation with Pixel Mean Flows
- URL: http://arxiv.org/abs/2601.22158v1
- Date: Thu, 29 Jan 2026 18:59:56 GMT
- Title: One-step Latent-free Image Generation with Pixel Mean Flows
- Authors: Yiyang Lu, Susie Lu, Qiao Sun, Hanhong Zhao, Zhicheng Jiang, Xianbang Wang, Tianhong Li, Zhengyang Geng, Kaiming He,
- Abstract summary: We propose "pixel MeanFlow" (pMF) to formulate the network output space and the loss space separately.<n>pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID)
- Score: 22.294629970410508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern diffusion/flow-based models for image generation typically exhibit two core characteristics: (i) using multi-step sampling, and (ii) operating in a latent space. Recent advances have made encouraging progress on each aspect individually, paving the way toward one-step diffusion/flow without latents. In this work, we take a further step towards this goal and propose "pixel MeanFlow" (pMF). Our core guideline is to formulate the network output space and the loss space separately. The network target is designed to be on a presumed low-dimensional image manifold (i.e., x-prediction), while the loss is defined via MeanFlow in the velocity space. We introduce a simple transformation between the image manifold and the average velocity field. In experiments, pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID), filling a key missing piece in this regime. We hope that our study will further advance the boundaries of diffusion/flow-based generative models.
Related papers
- Generative Modeling via Drifting [63.351930190408545]
We propose a new paradigm called Drifting Models, which evolve the pushforward distribution during training and naturally admit one-step inference.<n>In experiments, our one-step generator achieves state-of-the-art results on ImageNet at 256 x 256 resolution, with an FID of 1.54 in latent space and 1.61 in pixel space.
arXiv Detail & Related papers (2026-02-04T17:06:49Z) - DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation [93.6273078684831]
We propose a frequency-DeCoupled pixel diffusion framework to pursue a more efficient pixel diffusion paradigm.<n>With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance.<n>Experiments show that DeCo achieves superior performance among pixel diffusion models, attaining FID of 1.62 (256x256) and 2.22 (512x512) on ImageNet.
arXiv Detail & Related papers (2025-11-24T17:59:06Z) - Balanced conic rectified flow [19.226787997122987]
Rectified flow is a generative model that learns smooth transport mappings between two distributions through an ordinary differential equation (ODE)<n>In this work, we experimentally expose the limitations of the original rectified flow and propose a novel approach that incorporates real images into the training process.
arXiv Detail & Related papers (2025-10-29T07:06:01Z) - Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training [23.632047555553324]
We introduce a novel two-stage training framework for pixel-space diffusion and consistency models.<n>Our training framework demonstrates strong empirical performance on ImageNet dataset.<n>To the best of our knowledge, this marks the first successful training of a consistency model directly on high-resolution images.
arXiv Detail & Related papers (2025-10-14T14:41:16Z) - Mean Flows for One-step Generative Modeling [64.4997821467102]
We propose a principled and effective framework for one-step generative modeling.<n>A well-defined identity between average and instantaneous velocities is derived and used to guide neural network training.<n>Our method, termed the MeanFlow model, is self-contained and requires no pre-training, distillation, or curriculum learning.
arXiv Detail & Related papers (2025-05-19T17:59:42Z) - ProReflow: Progressive Reflow with Decomposed Velocity [52.249464542399636]
Flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation.<n>We introduce progressive reflow, which progressively reflows the diffusion models in local timesteps until the whole diffusion progresses.<n>We also introduce aligned v-prediction, which highlights the importance of direction matching in flow matching over magnitude matching.
arXiv Detail & Related papers (2025-03-05T04:50:53Z) - One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.<n>First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.<n>Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z) - MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize [18.73205699076486]
We introduce a diffusion framework leveraging multi-scale latent factorization.<n>Our framework decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal.<n>Our proposed architecture facilitates reduced sampling steps during the residual learning stage.
arXiv Detail & Related papers (2025-01-23T03:18:23Z) - One-step Diffusion with Distribution Matching Distillation [54.723565605974294]
We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator.
We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence.
Our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k.
arXiv Detail & Related papers (2023-11-30T18:59:20Z) - Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation [2.5556910002263984]
Score-based diffusion models (SBDM) have emerged as state-of-the-art approaches for image generation.
This paper develops SBDMs in the infinite-dimensional setting, that is, we model the training data as functions supported on a rectangular domain.
We demonstrate how to overcome two shortcomings of current SBDM approaches in the infinite-dimensional setting.
arXiv Detail & Related papers (2023-03-08T18:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.