Customizable ROI-Based Deep Image Compression
- URL: http://arxiv.org/abs/2507.00373v3
- Date: Thu, 03 Jul 2025 03:31:28 GMT
- Title: Customizable ROI-Based Deep Image Compression
- Authors: Jian Jin, Fanxin Xia, Feng Ding, Xinfeng Zhang, Meiqin Liu, Yao Zhao, Weisi Lin, Lili Meng,
- Abstract summary: Region of Interest (ROI)-based image compression prioritizes bit allocation by prioritizing ROI for higher-quality reconstruction.<n>Existing ROI-based image compression schemes predefine the ROI, making it unchangeable.<n>This work proposes a paradigm for customizable ROI-based deep image compression.
- Score: 69.93869435045916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Region of Interest (ROI)-based image compression optimizes bit allocation by prioritizing ROI for higher-quality reconstruction. However, as the users (including human clients and downstream machine tasks) become more diverse, ROI-based image compression needs to be customizable to support various preferences. For example, different users may define distinct ROI or require different quality trade-offs between ROI and non-ROI. Existing ROI-based image compression schemes predefine the ROI, making it unchangeable, and lack effective mechanisms to balance reconstruction quality between ROI and non-ROI. This work proposes a paradigm for customizable ROI-based deep image compression. First, we develop a Text-controlled Mask Acquisition (TMA) module, which allows users to easily customize their ROI for compression by just inputting the corresponding semantic \emph{text}. It makes the encoder controlled by text. Second, we design a Customizable Value Assign (CVA) mechanism, which masks the non-ROI with a changeable extent decided by users instead of a constant one to manage the reconstruction quality trade-off between ROI and non-ROI. Finally, we present a Latent Mask Attention (LMA) module, where the latent spatial prior of the mask and the latent Rate-Distortion Optimization (RDO) prior of the image are extracted and fused in the latent space, and further used to optimize the latent representation of the source image. Experimental results demonstrate that our proposed customizable ROI-based deep image compression paradigm effectively addresses the needs of customization for ROI definition and mask acquisition as well as the reconstruction quality trade-off management between the ROI and non-ROI.
Related papers
- UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior [56.35236964617809]
Image restoration aims to recover content from inputs degraded by various factors, such as adverse weather, blur, and noise.<n>This paper introduces UniRestore, a unified image restoration model that bridges the gap between PIR and TIR.<n>We propose a Complementary Feature Restoration Module (CFRM) to reconstruct degraded encoder features and a Task Feature Adapter (TFA) module to facilitate adaptive feature fusion in the decoder.
arXiv Detail & Related papers (2025-01-22T08:06:48Z) - ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image
Identification [1.9580473532948401]
We propose a novel ROI-aware multiscale cross-attention vision transformer (ROI-ViT)
The proposed ROI-ViT is designed using dual branches, called Pest and ROI branches, which take different types of maps as input: Pest images and ROI maps.
The experimental results show that the proposed ROI-ViT achieves 81.81%, 99.64%, and 84.66% for IP102, D0, and SauTeg pest datasets, respectively.
arXiv Detail & Related papers (2023-12-28T09:16:27Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - ROI-based Deep Image Compression with Swin Transformers [14.044999439481511]
Region Of Interest (ROI) with better quality than the background has many applications including video conferencing systems.
We propose a ROI-based image compression framework with Swin transformers as main building blocks for the autoencoder network.
arXiv Detail & Related papers (2023-05-12T22:05:44Z) - Learning Resolution-Adaptive Representations for Cross-Resolution Person
Re-Identification [49.57112924976762]
Cross-resolution person re-identification problem aims to match low-resolution (LR) query identity images against high resolution (HR) gallery images.
It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras.
This paper explores an alternative SR-free paradigm to directly compare HR and LR images via a dynamic metric, which is adaptive to the resolution of a query image.
arXiv Detail & Related papers (2022-07-09T03:49:51Z) - Region-of-Interest Based Neural Video Compression [19.81699221664852]
We introduce two models for ROI-based neural video coding.
First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background.
We show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI.
arXiv Detail & Related papers (2022-03-03T19:37:52Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - Rank-One Network: An Effective Framework for Image Restoration [18.55701190218365]
We propose a new framework comprised of two modules, i.e., the RO decomposition and RO reconstruction.
The RO decomposition is developed to decompose a corrupted image into the RO components and residual.
The RO reconstruction is aimed to reconstruct the important information, respectively from the RO components and residual, as well as to restore the image from this reconstructed information.
arXiv Detail & Related papers (2020-11-25T09:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.