Related papers: HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

URL: http://arxiv.org/abs/2504.06232v2
Date: Fri, 16 May 2025 13:11:45 GMT
Title: HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
Authors: Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang,
Abstract summary: HiFlow is a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models.<n>HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models.
Score: 70.69373563281324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.

Related papers

HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling [1.9474278832087901]
HiWave is a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis.<n>A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons.
arXiv Detail & Related papers (2025-06-25T13:58:37Z)
Align Your Flow: Scaling Continuous-Time Flow Map Distillation [63.927438959502226]
Flow maps connect any two noise levels in a single step and remain effective across all step counts.<n>We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks.<n>We show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.
arXiv Detail & Related papers (2025-06-17T15:06:07Z)
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis [44.2114053357308]
We present a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis.<n>The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers.
arXiv Detail & Related papers (2025-06-06T17:58:39Z)
FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation [61.61415607972597]
DiT diffusion models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale.<n>High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs)<n>We propose a novel two stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality.
arXiv Detail & Related papers (2025-02-07T18:59:59Z)
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion [50.43304425256732]
FreeScale is a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion.<n>We extend the capabilities of higher-resolution visual generation for both image and video models.
arXiv Detail & Related papers (2024-12-12T18:59:59Z)
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow [50.55228067778858]
Rectified Flow Transformers (RFTs) offer superior training and inference efficiency. We introduce the I-Max framework to maximize the resolution potential of Text-to-Image RFTs.
arXiv Detail & Related papers (2024-10-10T02:08:23Z)
UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks [36.61645124563195]
We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions. We use semantics-rich representations of lower-resolution images in the later denoising stage to guide the whole generation of highly detailed high-resolution images. Our model achieves fast training with reduced data requirements, producing photo-realistic high-resolution images.
arXiv Detail & Related papers (2024-07-02T11:02:19Z)
FlowIE: Efficient Image Enhancement via Rectified Flow [71.6345505427213]
FlowIE is a flow-based framework that estimates straight-line paths from an elementary distribution to high-quality images. Our contributions are rigorously validated through comprehensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-06-01T17:29:29Z)
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [48.9652334528436]
We introduce an innovative, training-free approach FouriScale from the perspective of frequency domain analysis. We replace the original convolutional layers in pre-trained diffusion models by incorporating a dilation technique along with a low-pass operation. Our method successfully balances the structural integrity and fidelity of generated images, achieving an astonishing capacity of arbitrary-size, high-resolution, and high-quality generation.
arXiv Detail & Related papers (2024-03-19T17:59:33Z)
CasSR: Activating Image Power for Real-World Image Super-Resolution [24.152495730507823]
Cascaded diffusion for Super-Resolution, CasSR, is a novel method designed to produce highly detailed and realistic images. We develop a cascaded controllable diffusion model that aims to optimize the extraction of information from low-resolution images.
arXiv Detail & Related papers (2024-03-18T03:59:43Z)
GMFlow: Learning Optical Flow via Global Matching [124.57850500778277]
We propose a GMFlow framework for learning optical flow estimation. It consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. Our new framework outperforms 32-iteration RAFT's performance on the challenging Sintel benchmark.
arXiv Detail & Related papers (2021-11-26T18:59:56Z)
Interpretable Detail-Fidelity Attention Network for Single Image Super-Resolution [89.1947690981471]
We propose a purposeful and interpretable detail-fidelity attention network to progressively process smoothes and details in divide-and-conquer manner. Particularly, we propose a Hessian filtering for interpretable feature representation which is high-profile for detail inference. Experiments demonstrate that the proposed methods achieve superior performances over the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-28T08:31:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.