From Sketch to Fresco: Efficient Diffusion Transformer with Progressive Resolution
- URL: http://arxiv.org/abs/2601.07462v1
- Date: Mon, 12 Jan 2026 12:15:30 GMT
- Title: From Sketch to Fresco: Efficient Diffusion Transformer with Progressive Resolution
- Authors: Shikang Zheng, Guantao Chen, Lixuan He, Jiacheng Liu, Yuqi Lin, Chang Zou, Linfeng Zhang,
- Abstract summary: Diffusion Transformers achieve impressive generative quality but remain expensive due to iterative sampling.<n>We propose textbfFresco, a dynamic resolution framework that unifies re-noise and global structure across stages with progressive upsampling.
- Score: 11.05647700476321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Transformers achieve impressive generative quality but remain computationally expensive due to iterative sampling. Recently, dynamic resolution sampling has emerged as a promising acceleration technique by reducing the resolution of early sampling steps. However, existing methods rely on heuristic re-noising at every resolution transition, injecting noise that breaks cross-stage consistency and forces the model to relearn global structure. In addition, these methods indiscriminately upsample the entire latent space at once without checking which regions have actually converged, causing accumulated errors, and visible artifacts. Therefore, we propose \textbf{Fresco}, a dynamic resolution framework that unifies re-noise and global structure across stages with progressive upsampling, preserving both the efficiency of low-resolution drafting and the fidelity of high-resolution refinement, with all stages aligned toward the same final target. Fresco achieves near-lossless acceleration across diverse domains and models, including 10$\times$ speedup on FLUX, and 5$\times$ on HunyuanVideo, while remaining orthogonal to distillation, quantization and feature caching, reaching 22$\times$ speedup when combined with distilled models. Our code is in supplementary material and will be released on Github.
Related papers
- Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration [58.19554276924402]
We propose spectral diffusion feature forecaster (Spectrum) to enable global, long-range feature reuse with tightly controlled error.<n>We achieve up to 4.79$times$ speedup on FLUX.1 and 4.67$times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.
arXiv Detail & Related papers (2026-03-02T08:59:11Z) - D$^2$-VR: Degradation-Robust and Distilled Video Restoration with Synergistic Optimization Strategy [7.553742541566094]
integration of diffusion priors with temporal alignment has emerged as a transformative paradigm for video restoration, delivering fantastic perceptual quality.<n>We propose textbfD$2$-VR, a single-image diffusion-based video-restoration framework with low-step inference.
arXiv Detail & Related papers (2026-02-09T08:52:51Z) - OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models [5.2258248597807535]
DiffusionTransformers-stemming from a large number of sampling steps and complex per-step computations-presents significant challenges for real-time deployment.<n>We introduce OmniCache, a training-free acceleration method that exploits the global redundancy inherent in the denoising process.
arXiv Detail & Related papers (2025-08-22T08:36:58Z) - Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models [53.087070073434845]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face image quality degradation under a low-latency budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as ours), a novel ODE solver that mitigates truncation errors by incorporating multiple parallel gradient evaluations in each ODE step.
arXiv Detail & Related papers (2025-07-20T03:08:06Z) - Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers [9.875073051988057]
Region-Adaptive Latent Upsampling (RALU) is a training-free framework that accelerates inference along spatial dimension.<n>RALU performs mixed-resolution sampling across three stages: 1) low-resolution denoising latent diffusion to efficiently capture global semantic structure, 2) region-adaptive upsampling on specific regions prone to artifacts at full-resolution, and 3) all latent upsampling at full-resolution for detail refinement.<n>Our method significantly reduces computation while preserving image quality by achieving up to 7.0$times$ speed-up on FLUX and 3.0$times$ on Stable Diffusion 3 with minimal degradation.
arXiv Detail & Related papers (2025-07-11T09:07:43Z) - Self-Cascaded Diffusion Models for Arbitrary-Scale Image Super-Resolution [9.322053509028832]
We present CasArbi, a self-cascaded diffusion framework for arbitrary-scale image super-resolution.<n>Our novel coordinate-guided residual diffusion model allows for the learning of continuous image representations.<n>Our experiments demonstrate that our CasArbi outperforms prior arts in both perceptual and distortion performance metrics.
arXiv Detail & Related papers (2025-06-09T14:43:21Z) - Training-free Diffusion Acceleration with Bottleneck Sampling [37.9135035506567]
Bottleneck Sampling is a training-free framework that leverages low-resolution priors to reduce computational overhead while preserving output fidelity.<n>It accelerates inference by up to 3$times$ for image generation and 2.5$times$ for video generation, all while maintaining output quality comparable to the standard full-resolution sampling process.
arXiv Detail & Related papers (2025-03-24T17:59:02Z) - Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling [50.34513854725803]
Arbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors.<n>We propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting.
arXiv Detail & Related papers (2025-03-09T13:43:57Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner [112.99126045081046]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.<n>We propose a textbftimestep tuner that helps find a more accurate integral direction for a particular interval at the minimum cost.<n>Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - Hessian-Free High-Resolution Nesterov Acceleration for Sampling [55.498092486970364]
Nesterov's Accelerated Gradient (NAG) for optimization has better performance than its continuous time limit (noiseless kinetic Langevin) when a finite step-size is employed.
This work explores the sampling counterpart of this phenonemon and proposes a diffusion process, whose discretizations can yield accelerated gradient-based MCMC methods.
arXiv Detail & Related papers (2020-06-16T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.