Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
- URL: http://arxiv.org/abs/2412.16822v1
- Date: Sun, 22 Dec 2024 02:04:17 GMT
- Title: Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
- Authors: Haoran You, Connelly Barnes, Yuqian Zhou, Yan Kang, Zhenbang Du, Wei Zhou, Lingzhi Zhang, Yotam Nitzan, Xiaoyang Liu, Zhe Lin, Eli Shechtman, Sohrab Amirghodsi, Yingyan Celine Lin,
- Abstract summary: Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency.
We propose DiffRatio-MoD, a dynamic DiT inference framework with differentiable compression ratios.
- Score: 55.87192133758051
- License:
- Abstract: Diffusion Transformers (DiTs) have achieved state-of-the-art (SOTA) image generation quality but suffer from high latency and memory inefficiency, making them difficult to deploy on resource-constrained devices. One key efficiency bottleneck is that existing DiTs apply equal computation across all regions of an image. However, not all image tokens are equally important, and certain localized areas require more computation, such as objects. To address this, we propose DiffRatio-MoD, a dynamic DiT inference framework with differentiable compression ratios, which automatically learns to dynamically route computation across layers and timesteps for each image token, resulting in Mixture-of-Depths (MoD) efficient DiT models. Specifically, DiffRatio-MoD integrates three features: (1) A token-level routing scheme where each DiT layer includes a router that is jointly fine-tuned with model weights to predict token importance scores. In this way, unimportant tokens bypass the entire layer's computation; (2) A layer-wise differentiable ratio mechanism where different DiT layers automatically learn varying compression ratios from a zero initialization, resulting in large compression ratios in redundant layers while others remain less compressed or even uncompressed; (3) A timestep-wise differentiable ratio mechanism where each denoising timestep learns its own compression ratio. The resulting pattern shows higher ratios for noisier timesteps and lower ratios as the image becomes clearer. Extensive experiments on both text-to-image and inpainting tasks show that DiffRatio-MoD effectively captures dynamism across token, layer, and timestep axes, achieving superior trade-offs between generation quality and efficiency compared to prior works.
Related papers
- OminiControl: Minimal and Universal Control for Diffusion Transformer [68.3243031301164]
OminiControl is a framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models.
At its core, OminiControl leverages a parameter reuse mechanism, enabling the DiT to encode image conditions using itself as a powerful backbone.
OminiControl addresses a wide range of image conditioning tasks in a unified manner, including subject-driven generation and spatially-aligned conditions.
arXiv Detail & Related papers (2024-11-22T17:55:15Z) - HPC: Hierarchical Progressive Coding Framework for Volumetric Video [39.403294185116]
Volumetric video based on Neural Radiance Field (NeRF) holds vast potential for various 3D applications.
Current NeRF compression lacks the flexibility to adjust video quality and within a single model for various network and device capacities.
We propose HPC, a novel hierarchical progressive video coding framework achieving variable using a single model.
arXiv Detail & Related papers (2024-07-12T06:34:24Z) - Progressive Learning with Visual Prompt Tuning for Variable-Rate Image
Compression [60.689646881479064]
We propose a progressive learning paradigm for transformer-based variable-rate image compression.
Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively.
Our model outperforms all current variable image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed image compression methods trained from scratch.
arXiv Detail & Related papers (2023-11-23T08:29:32Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - DiffRate : Differentiable Compression Rate for Efficient Vision
Transformers [98.33906104846386]
Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens.
DiffRate is a novel token compression method that has several appealing properties prior arts do not have.
arXiv Detail & Related papers (2023-05-29T10:15:19Z) - High-Fidelity Variable-Rate Image Compression via Invertible Activation
Transformation [24.379052026260034]
We propose the Invertible Activation Transformation (IAT) module to tackle the issue of high-fidelity fine variable-rate image compression.
IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity.
Our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
arXiv Detail & Related papers (2022-09-12T07:14:07Z) - Variable-Rate Deep Image Compression through Spatially-Adaptive Feature
Transform [58.60004238261117]
We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815)
Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps.
The proposed framework allows us to perform task-aware image compressions for various tasks.
arXiv Detail & Related papers (2021-08-21T17:30:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.