SQ-GAN: Semantic Image Communications Using Masked Vector Quantization
- URL: http://arxiv.org/abs/2502.09520v1
- Date: Thu, 13 Feb 2025 17:35:57 GMT
- Title: SQ-GAN: Semantic Image Communications Using Masked Vector Quantization
- Authors: Francesco Pezone, Sergio Barbarossa, Giuseppe Caire,
- Abstract summary: This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach to optimize image compression for semantic/task-oriented communications.
SQ-GAN employs off-the-shelf semantic semantic segmentation and a new semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images.
- Score: 55.02795214161371
- License:
- Abstract: This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach integrating generative models to optimize image compression for semantic/task-oriented communications. SQ-GAN employs off-the-shelf semantic semantic segmentation and a new specifically developed semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images. SQ-GAN outperforms state-of-the-art image compression schemes such as JPEG2000 and BPG across multiple metrics, including perceptual quality and semantic segmentation accuracy on the post-decoding reconstructed image, at extreme low compression rates expressed in bits per pixel.
Related papers
- Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers [59.772673692679085]
We propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs.
First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images.
Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization.
arXiv Detail & Related papers (2024-12-21T09:30:45Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy.
Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression.
Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z) - SigVIC: Spatial Importance Guided Variable-Rate Image Compression [43.062173445454775]
variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression.
One of the most common approaches for variable-rate is to channel-wisely or spatial-uniformly scale the internal features.
We introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask.
arXiv Detail & Related papers (2023-03-16T06:57:51Z) - MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation [41.029441562130984]
Two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images.
Our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation.
arXiv Detail & Related papers (2022-09-19T13:26:51Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Transformer-based Image Compression [18.976159633970177]
Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders.
TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-11-12T13:13:20Z) - Variable-Rate Deep Image Compression through Spatially-Adaptive Feature
Transform [58.60004238261117]
We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815)
Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps.
The proposed framework allows us to perform task-aware image compressions for various tasks.
arXiv Detail & Related papers (2021-08-21T17:30:06Z) - Hierarchical Quantized Autoencoders [3.9146761527401432]
We motivate the use of a hierarchy of Vector Quantized Variencoders (VQ-VAEs) to attain high factors of compression.
We show that a combination of quantization and hierarchical latent structure aids likelihood-based image compression.
Our resulting scheme produces a Markovian series of latent variables that reconstruct images of high-perceptual quality.
arXiv Detail & Related papers (2020-02-19T11:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.