Related papers: ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

URL: http://arxiv.org/abs/2412.06163v1
Date: Mon, 09 Dec 2024 02:51:24 GMT
Title: ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance
Authors: Yuming Li, Peidong Jia, Daiwei Hong, Yueru Jia, Qi She, Rui Zhao, Ming Lu, Shanghang Zhang,
Abstract summary: Training-free high-resolution (HR) image generation has garnered significant attention due to the high costs of training large diffusion models.<n>We introduce ASGDiffusion for parallel HR generation with Asynchronous Structure Guidance (ASG) using pre-trained diffusion models.<n>Our method effectively and efficiently addresses common issues like pattern repetition and achieves state-of-the-art HR generation.
Score: 30.190913570076525
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training-free high-resolution (HR) image generation has garnered significant attention due to the high costs of training large diffusion models. Most existing methods begin by reconstructing the overall structure and then proceed to refine the local details. Despite their advancements, they still face issues with repetitive patterns in HR image generation. Besides, HR generation with diffusion models incurs significant computational costs. Thus, parallel generation is essential for interactive applications. To solve the above limitations, we introduce a novel method named ASGDiffusion for parallel HR generation with Asynchronous Structure Guidance (ASG) using pre-trained diffusion models. To solve the pattern repetition problem of HR image generation, ASGDiffusion leverages the low-resolution (LR) noise weighted by the attention mask as the structure guidance for the denoising step to ensure semantic consistency. The proposed structure guidance can significantly alleviate the pattern repetition problem. To enable parallel generation, we further propose a parallelism strategy, which calculates the patch noises and structure guidance asynchronously. By leveraging multi-GPU parallel acceleration, we significantly accelerate generation speed and reduce memory usage per GPU. Extensive experiments demonstrate that our method effectively and efficiently addresses common issues like pattern repetition and achieves state-of-the-art HR generation.

Related papers

Causal Autoregressive Diffusion Language Model [70.7353007255797]
CARD reformulates the diffusion process within a strictly causal attention mask, enabling dense, per-token supervision in a single forward pass.<n>Our results demonstrate that CARD achieves ARM-level data efficiency while unlocking the latency benefits of parallel generation.
arXiv Detail & Related papers (2026-01-29T17:38:29Z)
GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation [77.13582457917418]
We train a generative model solely on grid images comprising subsampled frames.<n>We learn to generate image sequences, using the strong self-attention mechanism of the Diffusion Transformer (DiT) to capture correlations between frames.<n>Our method consistently outperforms SoTA in quality and inference speed (at least twice-as-fast) across datasets.
arXiv Detail & Related papers (2025-12-24T16:46:04Z)
Uniform Discrete Diffusion with Metric Path for Video Generation [103.86033350602908]
Continuous-space video generation has advanced rapidly, while discrete approaches lag behind due to error accumulation and long-duration inconsistency.<n>We present Uniform generative modeling and present Uniform pAth (URSA), a powerful framework that bridges the gap with continuous approaches for scalable video generation.<n>URSA consistently outperforms existing discrete methods and achieves performance comparable to state-of-the-art continuous diffusion methods.
arXiv Detail & Related papers (2025-10-28T17:59:57Z)
STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs [14.137795556562686]
This paper introduces Spatio-Temporal Adaptive Diffusion Inference (STADI), a novel framework to accelerate diffusion model inference.<n>At its core is a hybrid scheduler that orchestrates fine-grained parallelism across both temporal and spatial dimensions.<n>Our method significantly reduces end-to-end inference latency by up to 45% and significantly improves resource utilization on heterogeneous GPUs.
arXiv Detail & Related papers (2025-09-05T00:25:40Z)
Self-Reflective Reinforcement Learning for Diffusion-based Image Reasoning Generation [24.247140501653547]
Diffusion models have recently demonstrated exceptional performance in image generation task.<n>We propose SRRL, a self-reflective RL algorithm for diffusion models to achieve reasoning generation of logical images.
arXiv Detail & Related papers (2025-05-28T14:37:21Z)
Fast Autoregressive Models for Continuous Latent Generation [49.079819389916764]
Autoregressive models have demonstrated remarkable success in sequential data generation, particularly in NLP. Recent work, the masked autoregressive model (MAR) bypasses quantization by modeling per-token distributions in continuous spaces using a diffusion head. We propose Fast AutoRegressive model (FAR), a novel framework that replaces MAR's diffusion head with a lightweight shortcut head.
arXiv Detail & Related papers (2025-04-24T13:57:08Z)
Unifying Autoregressive and Diffusion-Based Sequence Generation [2.3923884480793673]
We present extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce hyperschedules, which assign distinct noise schedules to individual token positions. Second, we propose two hybrid token-wise noising processes that interpolate between absorbing and uniform processes, enabling the model to fix past mistakes.
arXiv Detail & Related papers (2025-04-08T20:32:10Z)
DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction. Few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z)
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework. Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss. We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z)
Edge-preserving noise for diffusion models [4.435514696080208]
We present a novel edge-preserving diffusion model that generalizes over existing isotropic models. We show that our model's generative process converges faster to results that more closely match the target distribution. Our edge-preserving diffusion process consistently outperforms state-of-the-art baselines in unconditional image generation.
arXiv Detail & Related papers (2024-10-02T13:29:52Z)
Diff-INR: Generative Regularization for Electrical Impedance Tomography [6.7667436349597985]
Electrical Impedance Tomography (EIT) reconstructs conductivity distributions within a body from boundary measurements. EIT reconstruction is hindered by its ill-posed nonlinear inverse problem, which complicates accurate results. We propose Diff-INR, a novel method that combines generative regularization with Implicit Neural Representations (INR) through a diffusion model.
arXiv Detail & Related papers (2024-09-06T14:21:23Z)
One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling. OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder. It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z)
Iterative Token Evaluation and Refinement for Real-World Super-Resolution [77.74289677520508]
Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations. We propose an Iterative Token Evaluation and Refinement framework for RWSR. We show that ITER is easier to train than Generative Adversarial Networks (GANs) and more efficient than continuous diffusion models.
arXiv Detail & Related papers (2023-12-09T17:07:32Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Loop Unrolled Shallow Equilibrium Regularizer (LUSER) -- A Memory-Efficient Inverse Problem Solver [26.87738024952936]
In inverse problems we aim to reconstruct some underlying signal of interest from potentially corrupted and often ill-posed measurements. We propose an LU algorithm with shallow equilibrium regularizers (L) These implicit models are as expressive as deeper convolutional networks, but far more memory efficient during training.
arXiv Detail & Related papers (2022-10-10T19:50:37Z)
Denoising Diffusion Restoration Models [110.1244240726802]
Denoising Diffusion Restoration Models (DDRM) is an efficient, unsupervised posterior sampling method. We demonstrate DDRM's versatility on several image datasets for super-resolution, deblurring, inpainting, and colorization.
arXiv Detail & Related papers (2022-01-27T20:19:07Z)
Phase Retrieval using Expectation Consistent Signal Recovery Algorithm based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems. Recent advances in deep learning have opened up a new possibility for robust and fast PR. We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z)
Deep Generative Adversarial Residual Convolutional Networks for Real-World Super-Resolution [31.934084942626257]
We propose a deep Super-Resolution Residual Convolutional Generative Adversarial Network (SRResCGAN) It follows the real-world degradation settings by adversarial training the model with pixel-wise supervision in the HR domain from its generated LR counterpart. The proposed network exploits the residual learning by minimizing the energy-based objective function with powerful image regularization and convex optimization techniques.
arXiv Detail & Related papers (2020-05-03T00:12:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.