NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers
- URL: http://arxiv.org/abs/2503.09242v2
- Date: Tue, 05 Aug 2025 08:21:33 GMT
- Title: NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers
- Authors: Yuhang Ma, Bo Cheng, Shanyuan Liu, Hongyi Zhou, Liebucha Wu, Xiaoyu Wu, Dawei Leng, Yuhui Yin,
- Abstract summary: Flow-based Transformer models have achieved state-of-the-art image generation performance, but often suffer from high inference latency and computational cost.<n>We propose Bridged Progressive Rectified Flow Transformers (NAMI), which decompose the generation process across temporal, spatial, and architectural demensions.
- Score: 10.84639914909133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Flow-based Transformer models have achieved state-of-the-art image generation performance, but often suffer from high inference latency and computational cost due to their large parameter sizes. To improve inference efficiency without compromising quality, we propose Bridged Progressive Rectified Flow Transformers (NAMI), which decompose the generation process across temporal, spatial, and architectural demensions. We divide the rectified flow into different stages according to resolution, and use a BridgeFlow module to connect them. Fewer Transformer layers are used at low-resolution stages to generate image layouts and concept contours, and more layers are progressively added as the resolution increases. Experiments demonstrate that our approach achieves fast convergence and reduces inference time while ensuring generation quality. The main contributions of this paper are summarized as follows: (1) We introduce Bridged Progressive Rectified Flow Transformers that enable multi-resolution training, accelerating model convergence; (2) NAMI leverages piecewise flow and spatial cascading of Diffusion Transformer (DiT) to rapidly generate images, reducing inference time by 64% for generating 1024 resolution images; (3) We propose a BridgeFlow module to align flows between different stages; (4) We propose the NAMI-1K benchmark to evaluate human preference performance, aiming to mitigate distributional bias and comprehensively assess model effectiveness. The results show that our model is competitive with state-of-the-art models.
Related papers
- Straighten Viscous Rectified Flow via Noise Optimization [24.065483360595458]
The Reflow operation aims to straighten the inference trajectories of the rectified flow during training by constructing deterministic couplings between noises and images.<n>We identify critical limitations in Reflow, particularly its inability to rapidly generate high-quality images due to a distribution gap between images in its constructed deterministic couplings and real images.<n>We propose a novel alternative called Straighten Viscous Rectified Flow via Noise Optimization (VRFNO), which is a joint training framework integrating an encoder and a neural velocity field.
arXiv Detail & Related papers (2025-07-14T12:35:17Z) - STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis [44.2114053357308]
We present a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis.<n>The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers.
arXiv Detail & Related papers (2025-06-06T17:58:39Z) - FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution [33.07779971446476]
We propose FlowDCN, a purely convolution-based generative model that can efficiently generate high-quality images at arbitrary resolutions.
FlowDCN achieves the state-of-the-art 4.30 sFID on $256times256$ ImageNet Benchmark and comparable resolution extrapolation results.
We believe FlowDCN offers a promising solution to scalable and flexible image synthesis.
arXiv Detail & Related papers (2024-10-30T02:48:50Z) - A Wavelet Diffusion GAN for Image Super-Resolution [7.986370916847687]
Diffusion models have emerged as a superior alternative to generative adversarial networks (GANs) for high-fidelity image generation.
However, their real-time feasibility is hindered by slow training and inference speeds.
This study proposes a wavelet-based conditional Diffusion GAN scheme for Single-Image Super-Resolution.
arXiv Detail & Related papers (2024-10-23T15:34:06Z) - FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model [76.84519526283083]
We present the textbfFlexible Vision Transformer (FiT), a transformer architecture specifically designed for generating images with textitunrestricted resolutions and aspect ratios
FiTv2 exhibits $2times$ convergence speed of FiT, when incorporating advanced training-free extrapolation techniques.
Comprehensive experiments demonstrate the exceptional performance of FiTv2 across a broad range of resolutions.
arXiv Detail & Related papers (2024-10-17T15:51:49Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner [70.90505084288057]
Flow-based models tend to produce a straighter sampling trajectory during the sampling process.
We introduce several techniques including a pseudo corrector and sample-aware compilation to further reduce inference time.
FlowTurbo reaches an FID of 2.12 on ImageNet with 100 (ms / img) and FID of 3.93 with 38 (ms / img)
arXiv Detail & Related papers (2024-09-26T17:59:51Z) - Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization [26.926712014346432]
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.<n>Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, setting new state-of-the-art FID scores of 1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512.
arXiv Detail & Related papers (2024-06-13T17:59:58Z) - FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images.
Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z) - Make a Cheap Scaling: A Self-Cascade Diffusion Model for
Higher-Resolution Adaptation [112.08287900261898]
This paper proposes a novel self-cascade diffusion model for rapid adaptation to higher-resolution image and video generation.
Our approach achieves a 5X training speed-up and requires only an additional 0.002M tuning parameters.
Experiments demonstrate that our approach can quickly adapt to higher resolution image and video synthesis by fine-tuning for just 10k steps, with virtually no additional inference time.
arXiv Detail & Related papers (2024-02-16T07:48:35Z) - CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster
Image Generation [49.3016007471979]
Large generative diffusion models have revolutionized text-to-image generation and offer immense potential for conditional generation tasks.
However, their widespread adoption is hindered by the high computational cost, which limits their real-time application.
We introduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs.
arXiv Detail & Related papers (2023-10-02T17:59:18Z) - ResShift: Efficient Diffusion Model for Image Super-resolution by
Residual Shifting [70.83632337581034]
Diffusion-based image super-resolution (SR) methods are mainly limited by the low inference speed.
We propose a novel and efficient diffusion model for SR that significantly reduces the number of diffusion steps.
Our method constructs a Markov chain that transfers between the high-resolution image and the low-resolution image by shifting the residual.
arXiv Detail & Related papers (2023-07-23T15:10:02Z) - StraIT: Non-autoregressive Generation with Stratified Image Transformer [63.158996766036736]
Stratified Image Transformer(StraIT) is a pure non-autoregressive(NAR) generative model.
Our experiments demonstrate that StraIT significantly improves NAR generation and out-performs existing DMs and AR methods.
arXiv Detail & Related papers (2023-03-01T18:59:33Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.