Related papers: Video Coding for Machines with Feature-Based Rate-Distortion Optimization

Video Coding for Machines with Feature-Based Rate-Distortion Optimization

URL: http://arxiv.org/abs/2203.05890v1
Date: Fri, 11 Mar 2022 12:49:50 GMT
Title: Video Coding for Machines with Feature-Based Rate-Distortion Optimization
Authors: Kristian Fischer, Fabian Brand, Christian Herglotz, Andr\'e Kaup
Abstract summary: With the steady improvement of neural networks, more and more multimedia data is not observed by humans anymore. We propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance. We compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO.
Score: 7.804710977378487
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Common state-of-the-art video codecs are optimized to deliver a low bitrate by providing a certain quality for the final human observer, which is achieved by rate-distortion optimization (RDO). But, with the steady improvement of neural networks solving computer vision tasks, more and more multimedia data is not observed by humans anymore, but directly analyzed by neural networks. In this paper, we propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance, when the decoded frame is analyzed by a neural network in a video coding for machine scenario. To that extent, we replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with distortion metrics calculated in the feature space created by the first layers of a neural network. Throughout several tests with the segmentation network Mask R-CNN and single images from the Cityscapes dataset, we compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO. With HFRDO, up to 5.49 % bitrate can be saved compared to the VTM-8.0 implementation in terms of Bj{\o}ntegaard Delta Rate and using the weighted average precision as quality metric. Additionally, allowing the encoder to vary the quantization parameter results in coding gains for the proposed HFRDO of up 9.95 % compared to conventional VTM.

Related papers

Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric. We simplify the RDO formulation to make the distortion term computable using block-based encoders. We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z)
Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos. We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z)
ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression [18.05997169440533]
We propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive auto-regressive. We show that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM.
arXiv Detail & Related papers (2023-07-12T11:45:54Z)
Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC) ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z)
VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z)
Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z)
Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks. We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation. We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z)
Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance. distinct models are required to be trained to reach different points in the rate-distortion (R-D) space. We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z)
Perceptually-inspired super-resolution of compressed videos [18.72040343193715]
spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. Recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality. In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial upsampling of compressed video using a modified CNN model.
arXiv Detail & Related papers (2021-06-15T13:50:24Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
Parallelized Rate-Distortion Optimized Quantization Using Deep Learning [9.886383889250064]
Rate-Distortion Optimized Quantization (RDOQ) has played an important role in the coding performance of recent video compression standards such as H.264/AVC, H.265/HEVC, VP9 and AV1. This work addresses this limitation using a neural network-based approach, which learns to trade-off rate and distortion during offline supervised training.
arXiv Detail & Related papers (2020-12-11T14:28:30Z)
A Variational Auto-Encoder Approach for Image Transmission in Wireless Channel [4.82810058837951]
We investigate the performance of variational auto-encoders and compare the results with standard auto-encoders. Our experiments demonstrate that the SSIM metric visually improves the quality of the reconstructed images at the receiver.
arXiv Detail & Related papers (2020-10-08T13:35:38Z)
Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks. We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.