ROI-based Deep Image Compression with Swin Transformers
- URL: http://arxiv.org/abs/2305.07783v1
- Date: Fri, 12 May 2023 22:05:44 GMT
- Title: ROI-based Deep Image Compression with Swin Transformers
- Authors: Binglin Li and Jie Liang and Haisheng Fu and Jingning Han
- Abstract summary: Region Of Interest (ROI) with better quality than the background has many applications including video conferencing systems.
We propose a ROI-based image compression framework with Swin transformers as main building blocks for the autoencoder network.
- Score: 14.044999439481511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Encoding the Region Of Interest (ROI) with better quality than the background
has many applications including video conferencing systems, video surveillance
and object-oriented vision tasks. In this paper, we propose a ROI-based image
compression framework with Swin transformers as main building blocks for the
autoencoder network. The binary ROI mask is integrated into different layers of
the network to provide spatial information guidance. Based on the ROI mask, we
can control the relative importance of the ROI and non-ROI by modifying the
corresponding Lagrange multiplier $ \lambda $ for different regions.
Experimental results show our model achieves higher ROI PSNR than other methods
and modest average PSNR for human evaluation. When tested on models pre-trained
with original images, it has superior object detection and instance
segmentation performance on the COCO validation dataset.
Related papers
- ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image
Identification [1.9580473532948401]
We propose a novel ROI-aware multiscale cross-attention vision transformer (ROI-ViT)
The proposed ROI-ViT is designed using dual branches, called Pest and ROI branches, which take different types of maps as input: Pest images and ROI maps.
The experimental results show that the proposed ROI-ViT achieves 81.81%, 99.64%, and 84.66% for IP102, D0, and SauTeg pest datasets, respectively.
arXiv Detail & Related papers (2023-12-28T09:16:27Z) - Investigating the Robustness and Properties of Detection Transformers
(DETR) Toward Difficult Images [1.5727605363545245]
Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks.
The critical issue to be addressed is how this model architecture can handle different image nuisances.
We studied this issue by measuring the performance of DETR with different experiments and benchmarking the network.
arXiv Detail & Related papers (2023-10-12T23:38:52Z) - Transformer-based Variable-rate Image Compression with
Region-of-interest Control [24.794581811606445]
This paper proposes a transformer-based learned image compression system.
It is capable of achieving variable-rate compression with a single model while supporting the region-of-interest functionality.
arXiv Detail & Related papers (2023-05-18T08:40:34Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Region-of-Interest Based Neural Video Compression [19.81699221664852]
We introduce two models for ROI-based neural video coding.
First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background.
We show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI.
arXiv Detail & Related papers (2022-03-03T19:37:52Z) - Learning Transformer Features for Image Quality Assessment [53.51379676690971]
We propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features.
The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme.
arXiv Detail & Related papers (2021-12-01T13:23:00Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - Deep Burst Super-Resolution [165.90445859851448]
We propose a novel architecture for the burst super-resolution task.
Our network takes multiple noisy RAW images as input, and generates a denoised, super-resolved RGB image as output.
In order to enable training and evaluation on real-world data, we additionally introduce the BurstSR dataset.
arXiv Detail & Related papers (2021-01-26T18:57:21Z) - Real Image Super Resolution Via Heterogeneous Model Ensemble using
GP-NAS [63.48801313087118]
We propose a new method for image superresolution using deep residual network with dense skip connections.
The proposed method won the first place in all three tracks of the AIM 2020 Real Image Super-Resolution Challenge.
arXiv Detail & Related papers (2020-09-02T22:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.