Region-of-Interest Based Neural Video Compression
- URL: http://arxiv.org/abs/2203.01978v1
- Date: Thu, 3 Mar 2022 19:37:52 GMT
- Title: Region-of-Interest Based Neural Video Compression
- Authors: Yura Perugachi-Diaz, Guillaume Sauti\`ere, Davide Abati, Yang Yang,
Amirhossein Habibian, Taco S Cohen
- Abstract summary: We introduce two models for ROI-based neural video coding.
First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background.
We show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI.
- Score: 19.81699221664852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans do not perceive all parts of a scene with the same resolution, but
rather focus on few regions of interest (ROIs). Traditional Object-Based codecs
take advantage of this biological intuition, and are capable of non-uniform
allocation of bits in favor of salient regions, at the expense of increased
distortion the remaining areas: such a strategy allows a boost in perceptual
quality under low rate constraints. Recently, several neural codecs have been
introduced for video compression, yet they operate uniformly over all spatial
locations, lacking the capability of ROI-based processing. In this paper, we
introduce two models for ROI-based neural video coding. First, we propose an
implicit model that is fed with a binary ROI mask and it is trained by
de-emphasizing the distortion of the background. Secondly, we design an
explicit latent scaling method, that allows control over the quantization
binwidth for different spatial regions of latent variables, conditioned on the
ROI mask. By extensive experiments, we show that our methods outperform all our
baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover,
they can generalize to different datasets and to any arbitrary ROI at inference
time. Finally, they do not require expensive pixel-level annotations during
training, as synthetic ROI masks can be used with little to no degradation in
performance. To the best of our knowledge, our proposals are the first
solutions that integrate ROI-based capabilities into neural video compression
models.
Related papers
- NERV++: An Enhanced Implicit Neural Video Representation [11.25130799452367]
We introduce neural representations for videos NeRV++, an enhanced implicit neural video representation.
NeRV++ is more straightforward yet effective enhancement over the original NeRV decoder architecture.
We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs.
arXiv Detail & Related papers (2024-02-28T13:00:32Z) - ROI-based Deep Image Compression with Swin Transformers [14.044999439481511]
Region Of Interest (ROI) with better quality than the background has many applications including video conferencing systems.
We propose a ROI-based image compression framework with Swin transformers as main building blocks for the autoencoder network.
arXiv Detail & Related papers (2023-05-12T22:05:44Z) - DNeRV: Modeling Inherent Dynamics via Difference Neural Representation
for Videos [53.077189668346705]
Difference Representation for Videos (eRV)
We analyze this from the perspective of limitation function fitting and the importance of frame difference.
DNeRV achieves competitive results against the state-of-the-art neural compression approaches.
arXiv Detail & Related papers (2023-04-13T13:53:49Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - Attentive CutMix: An Enhanced Data Augmentation Approach for Deep
Learning Based Image Classification [58.20132466198622]
We propose Attentive CutMix, a naturally enhanced augmentation strategy based on CutMix.
In each training iteration, we choose the most descriptive regions based on the intermediate attention maps from a feature extractor.
Our proposed method is simple yet effective, easy to implement and can boost the baseline significantly.
arXiv Detail & Related papers (2020-03-29T15:01:05Z) - Generalized Octave Convolutions for Learned Multi-Frequency Image
Compression [20.504561050200365]
We propose the first learned multi-frequency image compression and entropy coding approach.
It is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components.
We show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks.
arXiv Detail & Related papers (2020-02-24T01:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.