HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
- URL: http://arxiv.org/abs/2504.06232v2
- Date: Fri, 16 May 2025 13:11:45 GMT
- Title: HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
- Authors: Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang,
- Abstract summary: HiFlow is a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models.<n>HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models.
- Score: 70.69373563281324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.
Related papers
- HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling [1.9474278832087901]
HiWave is a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis.<n>A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons.
arXiv Detail & Related papers (2025-06-25T13:58:37Z) - Align Your Flow: Scaling Continuous-Time Flow Map Distillation [63.927438959502226]
Flow maps connect any two noise levels in a single step and remain effective across all step counts.<n>We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks.<n>We show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.
arXiv Detail & Related papers (2025-06-17T15:06:07Z) - STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis [44.2114053357308]
We present a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis.<n>The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers.
arXiv Detail & Related papers (2025-06-06T17:58:39Z) - FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation [61.61415607972597]
DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale.<n>High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs)<n>We propose a novel two stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality.
arXiv Detail & Related papers (2025-02-07T18:59:59Z) - FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion [50.43304425256732]
FreeScale is a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion.<n>We extend the capabilities of higher-resolution visual generation for both image and video models.
arXiv Detail & Related papers (2024-12-12T18:59:59Z) - I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow [50.55228067778858]
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency.
We introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs.
arXiv Detail & Related papers (2024-10-10T02:08:23Z) - UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks [36.61645124563195]
We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions.
We use semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images.
Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images.
arXiv Detail & Related papers (2024-07-02T11:02:19Z) - FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images.
Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z) - FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis.
We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation.
Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z) - CasSR: Activating Image Power for Real-World Image Super-Resolution [24.152495730507823]
Cascaded diffusion for Super-Resolution, CasSR, is a novel method designed to produce highly detailed and realistic images.
We develop a cascaded controllable diffusion model that aims to optimize the extraction of information from low-resolution images.
arXiv Detail & Related papers (2024-03-18T03:59:43Z) - GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation.
It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation.
Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z) - Interpretable Detail-Fidelity Attention Network for Single Image
Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner.
Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference.
Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.