Related papers: Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

URL: http://arxiv.org/abs/2410.06149v1
Date: Tue, 8 Oct 2024 15:48:34 GMT
Title: Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach
Authors: Sha Guo, Zhuo Chen, Yang Zhao, Ning Zhang, Xiaotong Li, Lingyu Duan,
Abstract summary: This paper introduces a content-adaptive diffusion model for scalable image compression. The proposed method encodes fine textures through a diffusion process, enhancing perceptual quality. Experiments demonstrate the effectiveness of the proposed framework in both image reconstruction and downstream machine vision tasks.
Score: 44.03561901593423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Traditional image codecs emphasize signal fidelity and human perception, often at the expense of machine vision tasks. Deep learning methods have demonstrated promising coding performance by utilizing rich semantic embeddings optimized for both human and machine vision. However, these compact embeddings struggle to capture fine details such as contours and textures, resulting in imperfect reconstructions. Furthermore, existing learning-based codecs lack scalability. To address these limitations, this paper introduces a content-adaptive diffusion model for scalable image compression. The proposed method encodes fine textures through a diffusion process, enhancing perceptual quality while preserving essential features for machine vision tasks. The approach employs a Markov palette diffusion model combined with widely used feature extractors and image generators, enabling efficient data compression. By leveraging collaborative texture-semantic feature extraction and pseudo-label generation, the method accurately captures texture information. A content-adaptive Markov palette diffusion model is then applied to represent both low-level textures and high-level semantic content in a scalable manner. This framework offers flexible control over compression ratios by selecting intermediate diffusion states, eliminating the need for retraining deep learning models at different operating points. Extensive experiments demonstrate the effectiveness of the proposed framework in both image reconstruction and downstream machine vision tasks such as object detection, segmentation, and facial landmark detection, achieving superior perceptual quality compared to state-of-the-art methods.

Related papers

SODiff: Semantic-Oriented Diffusion Model for JPEG Compression Artifacts Removal [50.90827365790281]
SODiff is a semantic-oriented one-step diffusion model for JPEG artifacts removal.<n>Our core idea is that effective restoration hinges on providing semantic-oriented guidance to the pre-trained diffusion model.<n>SAIPE extracts rich features from low-quality (LQ) images and projects them into an embedding space semantically aligned with that of the text encoder.
arXiv Detail & Related papers (2025-08-10T13:48:07Z)
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model [92.61216319417208]
We propose a novel diffusion model (DM)-based framework, dubbed ours, for image deblurring.<n>ours performs DM to generate the prior knowledge that aids in recovering the textures of blurry images.<n>To fully exploit the generated texture priors, we present the Texture Transfer Transformer layer (TTformer)
arXiv Detail & Related papers (2025-07-18T01:50:31Z)
Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression [90.59962443790593]
In this paper, we present a variable-rate image compression model based on invertible transform to overcome limitations. Specifically, we design a lightweight multi-scale invertible neural network, which maps the input image into multi-scale latent representations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods.
arXiv Detail & Related papers (2025-03-27T09:08:39Z)
Leveraging Diffusion Knowledge for Generative Image Compression with Fractal Frequency-Aware Band Learning [16.077768397480902]
generative image compression approaches produce detailed, realistic images instead of the only sharp-looking reconstructions. We propose a novel deep learning-based generative image compression method injected with diffusion knowledge. Our method achieves better distortions at high realism and better realism at low distortion than ever before.
arXiv Detail & Related papers (2025-03-14T11:41:33Z)
Stable Diffusion is a Natural Cross-Modal Decoder for Layered AI-generated Image Compression [7.643300240138419]
We introduce a scalable cross-modal compression framework that incorporates multiple human-comprehensible modalities. Our framework encodes images into a layered bitstream consisting of a semantic layer that delivers high-level semantic information. Our method proficiently restores both semantic and visual details, competing against baseline approaches at extremely lows.
arXiv Detail & Related papers (2024-12-17T15:01:35Z)
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS) A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z)
Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression [58.618625678054826]
This study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z)
ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images. We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image. The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z)
Lightweight Adaptive Feature De-drifting for Compressed Image Classification [10.265991649449507]
High-level vision models trained on high-quality images will suffer performance degradation when dealing with compressed images. Various learning-based JPEG artifact removal methods have been proposed to handle visual artifacts. This paper proposes a novel lightweight AFD module to boost the performance of pre-trained image classification models when facing compressed images.
arXiv Detail & Related papers (2024-01-03T13:03:44Z)
Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization. This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z)
Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation [36.20575570779196]
We exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models. The hierarchical latent space of HDAE inherently encodes different abstract levels of semantics and provides more comprehensive semantic representations. We demonstrate the effectiveness of our proposed approach with extensive experiments and applications on image reconstruction, style mixing, controllable, detail-preserving and disentangled image manipulation.
arXiv Detail & Related papers (2023-04-24T05:35:59Z)
DELAD: Deep Landweber-guided deconvolution with Hessian and sparse prior [0.22940141855172028]
We present a model for non-blind image deconvolution that incorporates the classic iterative method into a deep learning application. We build our network based on the iterative Landweber deconvolution algorithm, which is integrated with trainable convolutional layers to enhance the recovered image structures and details.
arXiv Detail & Related papers (2022-09-30T11:15:03Z)
Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z)
Joint Learning of Deep Texture and High-Frequency Features for Computer-Generated Image Detection [24.098604827919203]
We propose a joint learning strategy with deep texture and high-frequency features for CG image detection. A semantic segmentation map is generated to guide the affine transformation operation. The combination of the original image and the high-frequency components of the original and rendered images are fed into a multi-branch neural network equipped with attention mechanisms.
arXiv Detail & Related papers (2022-09-07T17:30:40Z)
Towards Analysis-friendly Face Representation with Scalable Feature and Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way. Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction. To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.