Related papers: SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models

URL: http://arxiv.org/abs/2602.02544v1
Date: Fri, 30 Jan 2026 05:22:44 GMT
Title: SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models
Authors: Wenhao Sun, Rong-Cheng Tu, Yifu Ding, Zhao Jin, Jingyi Liao, Yongcheng Jing, Dacheng Tao,
Abstract summary: We present SPA-Cache that jointly optimize update identification and budget allocation in DLM cache.<n>First, we derive a low-dimensional singular proxy that enables the identification of update-critical tokens in a low-dimensional subspace.<n>Second, we introduce an adaptive strategy that allocates fewer updates to stable layers without degrading generation quality.
Score: 56.45983529954998
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Diffusion Language Models (DLMs) offer a flexible, arbitrary-order alternative to the autoregressive paradigm, their non-causal nature precludes standard KV caching, forcing costly hidden state recomputation at every decoding step. Existing DLM caching approaches reduce this cost by selective hidden state updates; however, they are still limited by (i) costly token-wise update identification heuristics and (ii) rigid, uniform budget allocation that fails to account for heterogeneous hidden state dynamics. To address these challenges, we present SPA-Cache that jointly optimizes update identification and budget allocation in DLM cache. First, we derive a low-dimensional singular proxy that enables the identification of update-critical tokens in a low-dimensional subspace, substantially reducing the overhead of update identification. Second, we introduce an adaptive strategy that allocates fewer updates to stable layers without degrading generation quality. Together, these contributions significantly improve the efficiency of DLMs, yielding up to an $8\times$ throughput improvement over vanilla decoding and a $2$--$4\times$ speedup over existing caching baselines.

Related papers

SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching [75.02865981328509]
Caching reduces computation by reusing previously computed model outputs across timesteps.<n>We propose Sensitivity-Aware Caching (SenCache), a dynamic caching policy that adaptively selects caching timesteps on a per-sample basis.<n>SenCache achieves better visual quality than existing caching methods under similar computational budgets.
arXiv Detail & Related papers (2026-02-27T17:36:09Z)
Attention Is All You Need for KV Cache in Diffusion LLMs [36.94369617373333]
Elastic-Cache performs adaptive, layer-aware cache updates for diffusion large language models.<n>Our method achieves significantly higher throughput ($6.8times$ on GSM8K) than existing confidence-based approaches.
arXiv Detail & Related papers (2025-10-16T17:59:48Z)
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching [7.004421957218099]
Diffusion-based large language models (dLLMs) suffer from inferior inference efficiency.<n>We introduce d$2$Cache, which is a training-free approximate KV cache framework for accelerating dLLM inference.
arXiv Detail & Related papers (2025-09-27T04:07:23Z)
PDTrim: Targeted Pruning for Prefill-Decode Disaggregation in Inference [16.54910591046794]
Large Language Models (LLMs) demonstrate exceptional capabilities across various tasks, but their deployment is constrained by high computational and memory costs.<n>We propose a novel pruning method for PD disaggregation inference, enabling more precise and efficient block and KV Cache pruning.
arXiv Detail & Related papers (2025-08-29T02:29:52Z)
Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [72.27673320976933]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding.<n>Current caching techniques accelerate decoding by storing full-layer states, yet impose substantial memory usage.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention.
arXiv Detail & Related papers (2025-08-04T16:14:03Z)
Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations [67.35596444651037]
Vision-language models (VLMs) exhibit remarkable zero-shot capabilities but struggle with distribution shifts in downstream tasks when labeled data is unavailable.<n>We propose a Reliable Test-time Adaptation (ReTA) method that enhances reliability from two perspectives.
arXiv Detail & Related papers (2025-07-13T05:37:33Z)
dKV-Cache: The Cache for Diffusion Language Models [53.85291644298835]
Diffusion Language Models (DLMs) have been seen as a promising competitor for autoregressive language models.<n>We propose a KV-cache-like mechanism, delayed KV-Cache, for the denoising process of DLMs.<n>Our approach is motivated by the observation that different tokens have distinct representation dynamics throughout the diffusion process.
arXiv Detail & Related papers (2025-05-21T17:32:10Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
DeepCache: Accelerating Diffusion Models for Free [65.02607075556742]
DeepCache is a training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models. Under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS.
arXiv Detail & Related papers (2023-12-01T17:01:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.