I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
- URL: http://arxiv.org/abs/2410.07536v2
- Date: Tue, 15 Oct 2024 02:26:29 GMT
- Title: I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
- Authors: Ruoyi Du, Dongyang Liu, Le Zhuo, Qin Qi, Hongsheng Li, Zhanyu Ma, Peng Gao,
- Abstract summary: Rectified Flow Transformers (RFTs) offer superior training and inference efficiency.
We introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs.
- Score: 50.55228067778858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rectified Flow Transformers (RFTs) offer superior training and inference efficiency, making them likely the most viable direction for scaling up diffusion models. However, progress in generation resolution has been relatively slow due to data quality and training costs. Tuning-free resolution extrapolation presents an alternative, but current methods often reduce generative stability, limiting practical application. In this paper, we review existing resolution extrapolation methods and introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs. I-Max features: (i) a novel Projected Flow strategy for stable extrapolation and (ii) an advanced inference toolkit for generalizing model knowledge to higher resolutions. Experiments with Lumina-Next-2K and Flux.1-dev demonstrate I-Max's ability to enhance stability in resolution extrapolation and show that it can bring image detail emergence and artifact correction, confirming the practical value of tuning-free resolution extrapolation.
Related papers
- FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model [76.84519526283083]
We present the textbfFlexible Vision Transformer (FiT), a transformer architecture specifically designed for generating images with textitunrestricted resolutions and aspect ratios
FiTv2 exhibits $2times$ convergence speed of FiT, when incorporating advanced training-free extrapolation techniques.
Comprehensive experiments demonstrate the exceptional performance of FiTv2 across a broad range of resolutions.
arXiv Detail & Related papers (2024-10-17T15:51:49Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - FiT: Flexible Vision Transformer for Diffusion Model [81.85667773832279]
We present a transformer architecture specifically designed for generating images with unrestricted resolutions and aspect ratios.
Unlike traditional methods that perceive images as static-resolution grids, FiT conceptualizes images as sequences of dynamically-sized tokens.
Comprehensive experiments demonstrate the exceptional performance of FiT across a broad range of resolutions.
arXiv Detail & Related papers (2024-02-19T18:59:07Z) - DFU: scale-robust diffusion model for zero-shot super-resolution image
generation [15.689418447376587]
We present a novel deep-learning architecture, Dual-FNO UNet (DFU), which approximates the score operator by combining both spatial and spectral information at multiple resolutions.
We propose a fine-tuning strategy to further enhance the zero-shot super-resolution image-generation capability of our model, leading to a FID of 11.3 at 1.66 times the maximum training resolution on FFHQ.
arXiv Detail & Related papers (2023-11-30T23:31:33Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Implicit Diffusion Models for Continuous Super-Resolution [65.45848137914592]
This paper introduces an Implicit Diffusion Model (IDM) for high-fidelity continuous image super-resolution.
IDM integrates an implicit neural representation and a denoising diffusion model in a unified end-to-end framework.
The scaling factor regulates the resolution and accordingly modulates the proportion of the LR information and generated features in the final output.
arXiv Detail & Related papers (2023-03-29T07:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.