Video Coding for Machines with Feature-Based Rate-Distortion
Optimization
- URL: http://arxiv.org/abs/2203.05890v1
- Date: Fri, 11 Mar 2022 12:49:50 GMT
- Title: Video Coding for Machines with Feature-Based Rate-Distortion
Optimization
- Authors: Kristian Fischer, Fabian Brand, Christian Herglotz, Andr\'e Kaup
- Abstract summary: With the steady improvement of neural networks, more and more multimedia data is not observed by humans anymore.
We propose a standard-compliant feature-based RDO (FRDO) that is designed to increase the coding performance.
We compare the proposed FRDO and its hybrid version HFRDO with different distortion measures in the feature space against the conventional RDO.
- Score: 7.804710977378487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Common state-of-the-art video codecs are optimized to deliver a low bitrate
by providing a certain quality for the final human observer, which is achieved
by rate-distortion optimization (RDO). But, with the steady improvement of
neural networks solving computer vision tasks, more and more multimedia data is
not observed by humans anymore, but directly analyzed by neural networks. In
this paper, we propose a standard-compliant feature-based RDO (FRDO) that is
designed to increase the coding performance, when the decoded frame is analyzed
by a neural network in a video coding for machine scenario. To that extent, we
replace the pixel-based distortion metrics in conventional RDO of VTM-8.0 with
distortion metrics calculated in the feature space created by the first layers
of a neural network. Throughout several tests with the segmentation network
Mask R-CNN and single images from the Cityscapes dataset, we compare the
proposed FRDO and its hybrid version HFRDO with different distortion measures
in the feature space against the conventional RDO. With HFRDO, up to 5.49 %
bitrate can be saved compared to the VTM-8.0 implementation in terms of
Bj{\o}ntegaard Delta Rate and using the weighted average precision as quality
metric. Additionally, allowing the encoder to vary the quantization parameter
results in coding gains for the proposed HFRDO of up 9.95 % compared to
conventional VTM.
Related papers
- ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image
Compression [18.05997169440533]
We propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive auto-regressive.
We show that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM.
arXiv Detail & Related papers (2023-07-12T11:45:54Z) - Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - Perceptually-inspired super-resolution of compressed videos [18.72040343193715]
spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency.
Recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality.
In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial upsampling of compressed video using a modified CNN model.
arXiv Detail & Related papers (2021-06-15T13:50:24Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Parallelized Rate-Distortion Optimized Quantization Using Deep Learning [9.886383889250064]
Rate-Distortion Optimized Quantization (RDOQ) has played an important role in the coding performance of recent video compression standards such as H.264/AVC, H.265/HEVC, VP9 and AV1.
This work addresses this limitation using a neural network-based approach, which learns to trade-off rate and distortion during offline supervised training.
arXiv Detail & Related papers (2020-12-11T14:28:30Z) - A Variational Auto-Encoder Approach for Image Transmission in Wireless
Channel [4.82810058837951]
We investigate the performance of variational auto-encoders and compare the results with standard auto-encoders.
Our experiments demonstrate that the SSIM metric visually improves the quality of the reconstructed images at the receiver.
arXiv Detail & Related papers (2020-10-08T13:35:38Z) - Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks.
We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.