Related papers: Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model

Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model

URL: http://arxiv.org/abs/2501.00946v1
Date: Wed, 01 Jan 2025 20:16:27 GMT
Title: Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model
Authors: Omid Saghatchian, Atiyeh Gh. Moghadam, Ahmad Nickabadi,
Abstract summary: Diffusion models are hindered by their high computational cost and slow inference.<n>One such approach focuses on reducing the number of tokens fed into the self-attention, known as token merging (ToMe)
Score: 2.580765958706854
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Diffusion models have emerged as a promising approach for generating high-quality, high-dimensional images. Nevertheless, these models are hindered by their high computational cost and slow inference, partly due to the quadratic computational complexity of the self-attention mechanisms with respect to input size. Various approaches have been proposed to address this drawback. One such approach focuses on reducing the number of tokens fed into the self-attention, known as token merging (ToMe). In our method, which is called cached adaptive token merging(CA-ToMe), we calculate the similarity between tokens and then merge the r proportion of the most similar tokens. However, due to the repetitive patterns observed in adjacent steps and the variation in the frequency of similarities, we aim to enhance this approach by implementing an adaptive threshold for merging tokens and adding a caching mechanism that stores similar pairs across several adjacent steps. Empirical results demonstrate that our method operates as a training-free acceleration method, achieving a speedup factor of 1.24 in the denoising process while maintaining the same FID scores compared to existing approaches.

Related papers

AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model [59.065471969232284]
We propose a novel Aligned Tokenizer (AliTok) to align the tokenizer and autoregressive model.<n>On ImageNet-256 benchmark, using a standard decoder-only autoregressive model as the generator, AliTok achieves a gFID score of 1.50 and an IS of 305.9.<n>When the parameter count is increased to 662M, AliTok achieves a gFID score of 1.35, surpassing the state-of-the-art diffusion method with 10x faster sampling speed.
arXiv Detail & Related papers (2025-06-05T17:45:10Z)
AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse [19.13826316844611]
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference. We provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method. We propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results.
arXiv Detail & Related papers (2025-04-13T08:29:58Z)
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation [63.89280381800457]
We propose TokenBridge, which maintains the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens. We introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism. Our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction.
arXiv Detail & Related papers (2025-03-20T17:59:59Z)
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching [33.024044212891326]
Masked Autoregressive (MAR) models have emerged as a promising approach in image generation. We propose LazyMAR, which introduces two caching mechanisms to handle them one by one. Our method achieves 2.83 times acceleration with almost no drop in generation quality.
arXiv Detail & Related papers (2025-03-16T10:54:59Z)
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models [5.406829638216823]
Diffusion models have revolutionized generative tasks, especially in the domain of text-to-image synthesis. However, their iterative denoising process demands substantial computational resources. We present a novel acceleration strategy that integrates token-level pruning with caching techniques to tackle this computational challenge.
arXiv Detail & Related papers (2025-02-01T13:46:02Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation [8.46894039954642]
We propose a novel multi-scale token adaptation algorithm for interactive segmentation. By performing top-k operations across multi-scale tokens, the computational complexity is greatly simplified. We also propose a token learning algorithm based on contrastive loss.
arXiv Detail & Related papers (2024-01-09T07:59:42Z)
Token Fusion: Bridging the Gap between Token Pruning and Token Merging [71.84591084401458]
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging. We introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging.
arXiv Detail & Related papers (2023-12-02T04:29:19Z)
Deep Hashing via Householder Quantization [3.106177436374861]
Hashing is at the heart of large-scale image similarity search. A common solution is to employ loss functions that combine a similarity learning term and a quantization penalty term. We propose an alternative quantization strategy that decomposes the learning problem in two stages.
arXiv Detail & Related papers (2023-11-07T18:47:28Z)
Which Tokens to Use? Investigating Token Reduction in Vision Transformers [64.99704164972513]
We study the reduction patterns of 10 different token reduction methods using four image classification datasets. We find that the Top-K pruning method is a surprisingly strong baseline. The similarity of reduction patterns is a moderate-to-strong proxy for model performance.
arXiv Detail & Related papers (2023-08-09T01:51:07Z)
Linear Self-Attention Approximation via Trainable Feedforward Kernel [77.34726150561087]
In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches. We aim to expand the idea of trainable kernel methods to approximate the self-attention mechanism of the Transformer architecture.
arXiv Detail & Related papers (2022-11-08T08:14:11Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR. We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.