Related papers: Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers

URL: http://arxiv.org/abs/2510.04188v1
Date: Sun, 05 Oct 2025 13:01:08 GMT
Title: Let Features Decide Their Own Solvers: Hybrid Feature Caching for Diffusion Transformers
Authors: Shikang Zheng, Guantao Chen, Qinming Zhou, Yuqi Lin, Lixuan He, Chang Zou, Peiliang Cai, Jiacheng Liu, Linfeng Zhang,
Abstract summary: Diffusion Transformers offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck.<n>To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations.<n>We introduce HyCa, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies.
Score: 10.215762814937277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Transformers offer state-of-the-art fidelity in image and video synthesis, but their iterative sampling process remains a major bottleneck due to the high cost of transformer forward passes at each timestep. To mitigate this, feature caching has emerged as a training-free acceleration technique that reuses or forecasts hidden representations. However, existing methods often apply a uniform caching strategy across all feature dimensions, ignoring their heterogeneous dynamic behaviors. Therefore, we adopt a new perspective by modeling hidden feature evolution as a mixture of ODEs across dimensions, and introduce HyCa, a Hybrid ODE solver inspired caching framework that applies dimension-wise caching strategies. HyCa achieves near-lossless acceleration across diverse domains and models, including 5.55 times speedup on FLUX, 5.56 times speedup on HunyuanVideo, 6.24 times speedup on Qwen-Image and Qwen-Image-Edit without retraining.

Related papers

Flow caching for autoregressive video generation [72.10021661412364]
We present FlowCache, the first caching framework specifically designed for autoregressive video generation.<n>Our method achieves remarkable speedups of 2.38 times on MAGI-1 and 6.7 times on SkyReels-V2, with negligible quality degradation.
arXiv Detail & Related papers (2026-02-11T13:11:04Z)
PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache [61.57938553036056]
We introduce PackCache, a training-free KV-cache management method that compacts the KV cache through three coordinated mechanisms.<n>In terms of efficiency, PackCache accelerates end-to-end generation by 1.7-2.2x on 48-frame long sequences.
arXiv Detail & Related papers (2026-01-07T19:51:06Z)
H2-Cache: A Novel Hierarchical Dual-Stage Cache for High-Performance Acceleration of Generative Diffusion Models [7.8812023976358425]
H2-cache is a novel hierarchical caching mechanism designed for modern generative diffusion model architectures.<n>Our method is founded on the key insight that the denoising process can be functionally separated into a structure-defining stage and a detail-refining stage.<n>Experiments on the Flux architecture demonstrate that H2-cache achieves significant acceleration (up to 5.08x) while maintaining image quality nearly identical to the baseline.
arXiv Detail & Related papers (2025-10-31T04:47:14Z)
DiCache: Let Diffusion Model Determine Its Own Cache [62.954717254728166]
DiCache is a training-free adaptive caching strategy for accelerating diffusion models at runtime.<n>Online Probe Profiling Scheme leverages a shallow-layer online probe to obtain an on-the-fly indicator for the caching error in real time.<n> Dynamic Cache Trajectory Alignment approximates the deep-layer feature output from multi-step historical caches.
arXiv Detail & Related papers (2025-08-24T13:30:00Z)
Forecast then Calibrate: Feature Caching as ODE for Efficient Diffusion Transformers [19.107716099809707]
Diffusion Transformers (DiTs) have demonstrated exceptional performance in high-fidelity image and video generation.<n>Current methods often struggle to maintain generation quality at high acceleration ratios.<n>We propose FoCa, which treats feature caching as a feature-ODE solving problem.
arXiv Detail & Related papers (2025-08-22T08:34:03Z)
MixCache: Mixture-of-Cache for Video Diffusion Transformer Acceleration [15.22288174114487]
Caching is a widely adopted optimization method in DiT models.<n>We propose MixCache, a training-free caching-based framework for efficient video DiT inference.
arXiv Detail & Related papers (2025-08-18T07:49:33Z)
CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z)
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation [84.91431271257437]
Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation.<n>DiTs come with significant drawbacks, including increased computational and memory costs.<n>We propose QuantCache, a novel training-free inference acceleration framework.
arXiv Detail & Related papers (2025-03-09T10:31:51Z)
CacheQuant: Comprehensively Accelerated Diffusion Models [3.78219736760145]
CacheQuant is a novel training-free paradigm that comprehensively accelerates diffusion models by jointly optimizing model caching and quantization techniques.<n> Experimental results show that CacheQuant achieves a 5.18 speedup and 4 compression for Stable Diffusion on MS-COCO, with only a 0.02 loss in CLIP score.
arXiv Detail & Related papers (2025-03-03T09:04:51Z)
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers [4.7170474122879575]
Diffusion Transformers (DiT) have emerged as powerful generative models for various tasks, including image, video, and speech synthesis.<n>We introduce SmoothCache, a model-agnostic inference acceleration technique for DiT architectures.<n>Our experiments demonstrate that SmoothCache achieves 71% 8% to speed up while maintaining or even improving generation quality across diverse modalities.
arXiv Detail & Related papers (2024-11-15T16:24:02Z)
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching [56.286064975443026]
We make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through a caching mechanism, can be readily removed even without updating the model parameters. We introduce a novel scheme, named Learningto-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-r, alongside prior cache-based methods at the same inference speed.
arXiv Detail & Related papers (2024-06-03T18:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.