HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
- URL: http://arxiv.org/abs/2504.06232v1
- Date: Tue, 08 Apr 2025 17:30:40 GMT
- Title: HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
- Authors: Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang,
- Abstract summary: HiFlow is a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models.<n>HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models.
- Score: 70.69373563281324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging this flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's superiority in achieving superior high-resolution image quality over current state-of-the-art methods.
Related papers
- FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation [61.61415607972597]
DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale.<n>High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs)<n>We propose a novel two stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality.
arXiv Detail & Related papers (2025-02-07T18:59:59Z) - I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow [50.55228067778858]
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency.
We introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs.
arXiv Detail & Related papers (2024-10-10T02:08:23Z) - UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks [36.61645124563195]
We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions.
We use semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images.
Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images.
arXiv Detail & Related papers (2024-07-02T11:02:19Z) - FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images.
Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z) - CasSR: Activating Image Power for Real-World Image Super-Resolution [24.152495730507823]
Cascaded diffusion for Super-Resolution, CasSR, is a novel method designed to produce highly detailed and realistic images.
We develop a cascaded controllable diffusion model that aims to optimize the extraction of information from low-resolution images.
arXiv Detail & Related papers (2024-03-18T03:59:43Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - Interpretable Detail-Fidelity Attention Network for Single Image
Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner.
Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference.
Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.