Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
- URL: http://arxiv.org/abs/2412.16553v2
- Date: Mon, 30 Dec 2024 01:00:49 GMT
- Title: Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
- Authors: Yunshan Zhong, Yuyao Zhou, Yuxin Zhang, Shen Li, Yong Li, Fei Chao, Zhanpeng Zeng, Rongrong Ji,
- Abstract summary: We propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs.
First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images.
Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization.
- Score: 59.772673692679085
- License:
- Abstract: Data-free quantization (DFQ), which facilitates model quantization without real data to address increasing concerns about data security, has garnered significant attention within the model compression community. Recently, the unique architecture of vision transformers (ViTs) has driven the development of specialized DFQ techniques. However, we observe that the synthetic images from existing methods suffer from the deficient semantics issue compared to real images, thereby compromising performance. Motivated by this, we propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs. First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images. Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization and diverse semantics in synthetic images. Finally, SPDFQ employs Softlabel Learning (SL), where soft learning targets are adapted to encourage more complex semantics and accommodate images augmented by MSR. Experimental results demonstrate that SPDFQ significantly outperforms existing methods. For instance, SPDFQ achieves a 15.52% increase in top-1 accuracy on ImageNet for W4A4 ViT-B
Related papers
- SQ-GAN: Semantic Image Communications Using Masked Vector Quantization [55.02795214161371]
This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach to optimize image compression for semantic/task-oriented communications.
SQ-GAN employs off-the-shelf semantic semantic segmentation and a new semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images.
arXiv Detail & Related papers (2025-02-13T17:35:57Z) - Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning [70.98890307376548]
We propose a novel Patch-wise Cross-modal feature Mix-up (PCM) mechanism to adaptively mitigate the unfaithful contents during training.
Our PCM-Net ranks first in both in-domain and cross-domain zero-shot image captioning.
arXiv Detail & Related papers (2024-12-31T13:39:08Z) - Vision Transformer-based Semantic Communications With Importance-Aware Quantization [13.328970689723096]
This paper presents a vision transformer (ViT)-based semantic communication system with importance-aware quantization (IAQ) for wireless image transmission.
We show that our IAQ framework outperforms conventional image compression methods in both error-free and realistic communication scenarios.
arXiv Detail & Related papers (2024-12-08T19:24:47Z) - Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection [13.840950434728533]
State-of-the-art Synthetic Image Detection (SID) research has led to strong evidence on the advantages of feature extraction from foundation models.
We leverage the image representations extracted by intermediate Transformer blocks of CLIP's image-encoder via a lightweight network.
Our method is compared against the state-of-the-art by evaluating it on 20 test datasets and exhibits an average +10.6% absolute performance improvement.
arXiv Detail & Related papers (2024-02-29T12:18:43Z) - Semantic-aware Data Augmentation for Text-to-image Synthesis [19.28143363034362]
In text-to-image synthesis (T2Isyn), augmentation wisdom still suffers from the semantic mismatch between augmented paired data.
In this paper, we develop a novel Semantic-aware Data Augmentation framework dedicated to T2Isyn.
arXiv Detail & Related papers (2023-12-13T07:57:40Z) - Robustness-Guided Image Synthesis for Data-Free Quantization [15.91924736452861]
We propose Robustness-Guided Image Synthesis (RIS) to enrich the semantics of synthetic images and improve image diversity.
RIS is a simple but effective method to enrich the semantics of synthetic images and improve image diversity.
We achieve state-of-the-art performance for various settings on data-free quantization and can be extended to other data-free compression tasks.
arXiv Detail & Related papers (2023-10-05T16:39:14Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Towards Better Text-Image Consistency in Text-to-Image Generation [15.735515302139335]
We develop a novel CLIP-based metric termed as Semantic Similarity Distance (SSD)
We further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which can fuse semantic information at different granularities.
Our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.
arXiv Detail & Related papers (2022-10-27T07:47:47Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.