Related papers: CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers

CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers

URL: http://arxiv.org/abs/2507.15260v1
Date: Mon, 21 Jul 2025 05:48:47 GMT
Title: CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
Authors: Jiaqi Han, Haotian Ye, Puheng Li, Minkai Xu, James Zou, Stefano Ermon,
Abstract summary: Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
Score: 72.23291099555459
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures. Existing acceleration techniques either require extensive model retraining or compromise significantly on sample quality. This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism. Our framework views multi-core diffusion sampling as an ODE solver pipeline, where slower yet accurate solvers progressively rectify faster solvers through a theoretically justified inter-core communication mechanism. This motivates our multi-core training-free diffusion sampling accelerator, CHORDS, which is compatible with various diffusion samplers, model architectures, and modalities. Through extensive experiments, CHORDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation. This advancement enables CHORDS to establish a solid foundation for real-time, high-fidelity diffusion generation.

Related papers

One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.<n>First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.<n>Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z)
SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity [4.6126713437495495]
We present a novel diffusion model accelerator featuring a mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector.<n>Our accelerator achieves 6.91x speed-up and 51.5% energy reduction compared to traditional dense accelerators.
arXiv Detail & Related papers (2025-01-26T08:34:26Z)
PQD: Post-training Quantization for Efficient Diffusion Models [4.809939957401427]
We propose a novel post-training quantization for diffusion models (PQD)<n>We show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner.
arXiv Detail & Related papers (2024-12-30T19:55:59Z)
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)<n>The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z)
Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule [50.260693393896716]
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images.<n>Recent techniques have been employed to automatically search for faster generation processes.<n>We introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models.
arXiv Detail & Related papers (2024-09-26T06:28:05Z)
Memory-Efficient Fine-Tuning for Quantized Diffusion Model [12.875837358532422]
We introduce TuneQDM, a memory-efficient fine-tuning method for quantized diffusion models. Our method consistently outperforms the baseline in both single-/multi-subject generations.
arXiv Detail & Related papers (2024-01-09T03:42:08Z)
Boosting Latent Diffusion with Flow Matching [22.68317748373856]
Flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis.<n>We demonstrate that introducing flow matching between a frozen diffusion model and a convolutional decoder enables high-resolution image synthesis.<n>State-of-the-art high-resolution image synthesis is achieved at $10242$ pixels with minimal computational cost.
arXiv Detail & Related papers (2023-12-12T15:30:24Z)
Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks. We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture. We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.