Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion
- URL: http://arxiv.org/abs/2411.15113v1
- Date: Fri, 22 Nov 2024 18:29:37 GMT
- Title: Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion
- Authors: Samarth N Ramesh, Zhixue Zhao,
- Abstract summary: This paper presents a pioneering study on post-training pruning of Stable Diffusion 2.
It addresses the critical need for model compression in text-to-image domain.
We propose an optimal pruning configuration that prunes the text encoder to 47.5% and the diffusion generator to 35%.
- Score: 3.399289369740637
- License:
- Abstract: As text-to-image models grow increasingly powerful and complex, their burgeoning size presents a significant obstacle to widespread adoption, especially on resource-constrained devices. This paper presents a pioneering study on post-training pruning of Stable Diffusion 2, addressing the critical need for model compression in text-to-image domain. Our study tackles the pruning techniques for the previously unexplored multi-modal generation models, and particularly examines the pruning impact on the textual component and the image generation component separately. We conduct a comprehensive comparison on pruning the model or the single component of the model in various sparsities. Our results yield previously undocumented findings. For example, contrary to established trends in language model pruning, we discover that simple magnitude pruning outperforms more advanced techniques in text-to-image context. Furthermore, our results show that Stable Diffusion 2 can be pruned to 38.5% sparsity with minimal quality loss, achieving a significant reduction in model size. We propose an optimal pruning configuration that prunes the text encoder to 47.5% and the diffusion generator to 35%. This configuration maintains image generation quality while substantially reducing computational requirements. In addition, our work uncovers intriguing questions about information encoding in text-to-image models: we observe that pruning beyond certain thresholds leads to sudden performance drops (unreadable images), suggesting that specific weights encode critical semantics information. This finding opens new avenues for future research in model compression, interoperability, and bias identification in text-to-image models. By providing crucial insights into the pruning behavior of text-to-image models, our study lays the groundwork for developing more efficient and accessible AI-driven image generation systems
Related papers
- TextCraftor: Your Text Encoder Can be Image Quality Controller [65.27457900325462]
Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation.
We propose a proposed fine-tuning approach, TextCraftor, to enhance the performance of text-to-image diffusion models.
arXiv Detail & Related papers (2024-03-27T19:52:55Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [22.11487736315616]
Rectified flow is a recent generative model formulation that connects data and noise in a straight line.
We improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales.
We present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities.
arXiv Detail & Related papers (2024-03-05T18:45:39Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Trade-offs in Fine-tuned Diffusion Models Between Accuracy and
Interpretability [5.865936619867771]
We unravel a consequential trade-off between image fidelity as gauged by conventional metrics and model interpretability in generative diffusion models.
We present a set of design principles for the development of truly interpretable generative models.
arXiv Detail & Related papers (2023-03-31T09:11:26Z) - Benchmarking Robustness of Multimodal Image-Text Models under
Distribution Shift [50.64474103506595]
We investigate the robustness of 12 popular open-sourced image-text models under common perturbations on five tasks.
Character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data.
arXiv Detail & Related papers (2022-12-15T18:52:03Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.