Visual Analysis Motivated Rate-Distortion Model for Image Coding
- URL: http://arxiv.org/abs/2104.10315v1
- Date: Wed, 21 Apr 2021 02:27:34 GMT
- Title: Visual Analysis Motivated Rate-Distortion Model for Image Coding
- Authors: Zhimeng Huang, Chuanmin Jia, Shanshe Wang, Siwei Ma
- Abstract summary: This paper proposes a visual analysis-motivated rate-distortion model for Versatile Video Coding (VVC) intra compression.
The proposed model has two major contributions, a novel rate allocation strategy and a new distortion measurement model.
- Score: 34.76677294980739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimized for pixel fidelity metrics, images compressed by existing image
codec are facing systematic challenges when used for visual analysis tasks,
especially under low-bitrate coding. This paper proposes a visual
analysis-motivated rate-distortion model for Versatile Video Coding (VVC) intra
compression. The proposed model has two major contributions, a novel rate
allocation strategy and a new distortion measurement model. We first propose
the region of interest for machine (ROIM) to evaluate the degree of importance
for each coding tree unit (CTU) in visual analysis. Then, a novel CTU-level bit
allocation model is proposed based on ROIM and the local texture
characteristics of each CTU. After an in-depth analysis of multiple distortion
models, a visual analysis friendly distortion criteria is subsequently proposed
by extracting deep feature of each coding unit (CU). To alleviate the problem
of lacking spatial context information when calculating the distortion of each
CU, we finally propose a multi-scale feature distortion (MSFD) metric using
different neighboring pixels by weighting the extracted deep features in each
scale. Extensive experimental results show that the proposed scheme could
achieve up to 28.17\% bitrate saving under the same analysis performance among
several typical visual analysis tasks such as image classification, object
detection, and semantic segmentation.
Related papers
- Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem [23.833099288826045]
Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores.
However, they often results in blurry images due to insufficient recovery of high-frequency details.
We propose a novel method that incorporates Multi-Objective Optimization (MOO) into the training process of SISR models to balance perceptual quality and distortion.
arXiv Detail & Related papers (2024-09-05T02:14:04Z) - A Rate-Distortion-Classification Approach for Lossy Image Compression [0.0]
In lossy image compression, the objective is to achieve minimal signal distortion while compressing images to a specified bit rate.
To bridge the gap between image compression and visual analysis, we propose a Rate-Distortion-Classification (RDC) model for lossy image compression.
arXiv Detail & Related papers (2024-05-06T14:11:36Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Deep Learning-Based Defect Classification and Detection in SEM Images [1.9206693386750882]
In particular, we train RetinaNet models using different ResNet, VGGNet architectures as backbone.
We propose a preference-based ensemble strategy to combine the output predictions from different models in order to achieve better performance on classification and detection of defects.
arXiv Detail & Related papers (2022-06-20T16:34:11Z) - Sci-Net: a Scale Invariant Model for Building Detection from Aerial
Images [0.0]
We propose a Scale-invariant neural network (Sci-Net) that is able to segment buildings present in aerial images at different spatial resolutions.
Specifically, we modified the U-Net architecture and fused it with dense Atrous Spatial Pyramid Pooling (ASPP) to extract fine-grained multi-scale representations.
arXiv Detail & Related papers (2021-11-12T16:45:20Z) - Video Coding for Machine: Compact Visual Representation Compression for
Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression.
This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - Gigapixel Histopathological Image Analysis using Attention-based Neural
Networks [7.1715252990097325]
We propose a CNN structure consisting of a compressing path and a learning path.
Our method integrates both global and local information, is flexible with regard to the size of the input images and only requires weak image-level labels.
arXiv Detail & Related papers (2021-01-25T10:18:52Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z) - Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks.
We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.