CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
- URL: http://arxiv.org/abs/2507.15260v1
- Date: Mon, 21 Jul 2025 05:48:47 GMT
- Title: CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers
- Authors: Jiaqi Han, Haotian Ye, Puheng Li, Minkai Xu, James Zou, Stefano Ermon,
- Abstract summary: Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
- Score: 72.23291099555459
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures. Existing acceleration techniques either require extensive model retraining or compromise significantly on sample quality. This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism. Our framework views multi-core diffusion sampling as an ODE solver pipeline, where slower yet accurate solvers progressively rectify faster solvers through a theoretically justified inter-core communication mechanism. This motivates our multi-core training-free diffusion sampling accelerator, CHORDS, which is compatible with various diffusion samplers, model architectures, and modalities. Through extensive experiments, CHORDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation. This advancement enables CHORDS to establish a solid foundation for real-time, high-fidelity diffusion generation.
Related papers
- Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration [58.19554276924402]
We propose spectral diffusion feature forecaster (Spectrum) to enable global, long-range feature reuse with tightly controlled error.<n>We achieve up to 4.79$times$ speedup on FLUX.1 and 4.67$times$ speedup on Wan2.1-14B, while maintaining much higher sample quality compared with the baselines.
arXiv Detail & Related papers (2026-03-02T08:59:11Z) - Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models [32.70019265781621]
Text-to-image diffusion models have achieved unprecedented success but still struggle to produce high-quality images under limited sampling budgets.<n>We propose constant total rotation schedule (TORS) as a scheduling strategy that ensures uniform geometric variation along the sampling trajectory.<n>TORS outperforms previous training-free acceleration methods and produces high-quality images with 10 sampling steps on Flux.1-Dev and Stable Diffusion 3.5.
arXiv Detail & Related papers (2026-02-28T18:09:44Z) - Planned Diffusion [57.74615417331808]
A central challenge in large language model inference is the trade-off between generation speed and output quality.<n>We propose planned diffusion, a hybrid method that combines the strengths of both paradigms.<n>Planned diffusion works in two stages: first, the model creates a short autoregressive plan that breaks the output into smaller, independent spans.
arXiv Detail & Related papers (2025-10-20T20:27:48Z) - Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency [60.74505433956616]
continuous-time consistency model (sCM) is theoretically principled and empirically powerful for accelerating academic-scale diffusion.<n>We first develop a parallelism-compatible FlashAttention-2 JVP kernel, enabling sCM training on models with over 10 billion parameters and high-dimensional video tasks.<n>We propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer.
arXiv Detail & Related papers (2025-10-09T16:45:30Z) - Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation [19.010105652612616]
Hyper-Bagel is designed to simultaneously speed up both multimodal understanding and generation tasks.<n>For generative tasks, our resulting 6-NFE model yields a 16.67x speedup in text-to-image generation and a 22x speedup in image editing.
arXiv Detail & Related papers (2025-09-23T09:12:46Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models.<n>First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR.<n>Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z) - SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity [4.6126713437495495]
We present a novel diffusion model accelerator featuring a mixed-precision dense-sparse architecture, channel-last address mapping, and a time-step-aware sparsity detector.<n>Our accelerator achieves 6.91x speed-up and 51.5% energy reduction compared to traditional dense accelerators.
arXiv Detail & Related papers (2025-01-26T08:34:26Z) - PQD: Post-training Quantization for Efficient Diffusion Models [4.809939957401427]
We propose a novel post-training quantization for diffusion models (PQD)<n>We show that our proposed method is able to directly quantize full-precision diffusion models into 8-bit or 4-bit models while maintaining comparable performance in a training-free manner.
arXiv Detail & Related papers (2024-12-30T19:55:59Z) - TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models [49.65286242048452]
We propose a novel method dubbed Timestep-Channel Adaptive Quantization for Diffusion Models (TCAQ-DM)<n>The proposed method substantially outperforms the state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2024-12-21T16:57:54Z) - Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule [50.260693393896716]
Diffusion models are cutting-edge generative models adept at producing diverse, high-quality images.<n>Recent techniques have been employed to automatically search for faster generation processes.<n>We introduce Flexiffusion, a novel training-free NAS paradigm designed to accelerate diffusion models.
arXiv Detail & Related papers (2024-09-26T06:28:05Z) - Memory-Efficient Fine-Tuning for Quantized Diffusion Model [12.875837358532422]
We introduce TuneQDM, a memory-efficient fine-tuning method for quantized diffusion models.
Our method consistently outperforms the baseline in both single-/multi-subject generations.
arXiv Detail & Related papers (2024-01-09T03:42:08Z) - Boosting Latent Diffusion with Flow Matching [22.68317748373856]
Flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis.<n>We demonstrate that introducing flow matching between a frozen diffusion model and a convolutional decoder enables high-resolution image synthesis.<n>State-of-the-art high-resolution image synthesis is achieved at $10242$ pixels with minimal computational cost.
arXiv Detail & Related papers (2023-12-12T15:30:24Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.